ksqlDB intro: a terrible, horrible, no good, very bad tutorial
🚀🐋
Here's the repo y'all really want https://github.com/VaradGhodake/ksql-demo
git clone https://github.com/VaradGhodake/ksql-demo.git
cd ksql-demo
Launch 3 terminals, preferably with TMUX or Windows Terminal for convenience.
Terminal 1
Boot up everything
docker-compose up -d
docker-compose ps
Pull up a shell. The --network
flag isn't something to be overlooked. This simulates intranet. Everything we do, should be confined within this same network.
docker run --network docker_default --rm --interactive --tty confluentinc/ksqldb-cli:0.6.0 ksql http://ksql-server:8088
Terminal 2:
Generate page views, push them to the pageviews
topic
DO NOT FORGET sudo
unless you're madly in love with Linux error code #126
cd setup/docker
sudo ./generate_pageviews.sh
Terminal 1
show topics;
If it doesn't show you the pageview
topic, you're doing something wrong. Go back and perform all the steps again, slowly this time.
Terminal 1
Create a stream for pageview
topic called pageviews_original
CREATE STREAM pageviews_original (rowkey string key, viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews',
value_format='DELIMITED');
Details can be seen with DESCRIBE pageviews_original;
Terminal 1
Create another topic for users, users_original
The key
portion took a while for me to figure out, make sure we're on the same page here.
CREATE TABLE users_original (rowkey string key, viewtime bigint, userid varchar, regionid varchar, gender varchar) WITH (kafka_topic='users', key='userid',value_format='DELIMITED',partitions=1);
Terminal 3
Now bombard the newly created topic with users moving all around the regions.
sudo ./generate_users.sh
Terminal 1
In case you'd want to run the query for everything from the start,
set 'auto.offset.reset'='earliest';
# 'latest' if there's too much to process
SQL Examples
LEFT JOIN
example:
SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid EMIT CHANGES;
Filter
example:
SELECT * FROM users_original WHERE gender = 'FEMALE' EMIT CHANGES;
Aggregation
example:
SELECT userid, COUNT(pageid) AS p_count FROM pageviews_original
GROUP BY userid EMIT CHANGES;