A starting point for an Apache Flink project using Kotlin.
- Pre-requisites
- Up & Running
- Running The Job
- Observing the Job in Action
- Additional Containers
- Useful Commands
- Useful References
- Additional Topics - Visit Wiki
-
Docker on Mac
-
Gradle - You have a few options here
- If you're using Intellij, just make sure it's enabled.
- Run
brew install gradle
Let's first clone the repo and fire up our system,
git clone [email protected]:aedenj/apache-flink-kotlin-starter.git ~/projects/apache-flink-kotlin-starter
cd ~/projects/apache-flink-kotlin-starter;make kafka-start
Now you have a single node Kafka cluster with various admin tools to make life a little easier. See the Kafka cluster repo for its operating details.
The sample job in this repo reads from a topic named source
and writes to a topic named destination
.
There are a couple of ways of running this job depending on what you're trying to accomplish.
First, let's setup the kafka topics. Run make create-default-topics
.
For quick feedback it's easiest to run the job locally,
- If you're using Intellij, use the usual methods.
- On the command line run
make run
- Run
make flink-start
. - Verify the cluster is running by navigating to the UI
This will run the job with the cluster defined in flink-job-cluster.yml
.
By default the cluster runs one task manager and one slot however this can be changed. The make flink-start
command accepts two parameters NUM_TASK_MANAGERS
and NUM_TASK_SLOTS
. For example, one can run
make flink-start NUM_TASK_MANAGERS=2 NUM_TASK_SLOTS=3
and this will result in a job cluster running two task managers with three slots each and a default parallelism of six.
In order to stop the job run make flink-stop
After starting the job with one of the methods above, let's observe it reading and writing a message from one topic to another.
- Start the job using one of the methods above.
- If Kafka Cat isn't installed, run
brew install kcat
- To populate the source topic run
head -n 10 nyctaxi-dataset/yellowcab-trips-sample.json | kcat -b localhost -t source -T -P -l
- Navigate to the Kafka UI and view the messages both the
source
anddestination
topics.
You should see the same messages in both topics.
Both Prometheus and Grafana are available via Docker. In addition the Prometheus exporter has been enabled in the local cluster. In order to see this all in action,
- Ensure Kafka is running. If it isn't run,
make kafka-start
. I prefer to see the output. However, if you don't, you can add the-d
option to thekafka-start
command in the Makefile. - If the topics for the default Flink Job don't already exist run
make create-topics
- Now let's start up Grafana and Prometheus by running
make monitor-start
- Start the Flink Cluster by executing
make flink-start
- Navigate to the example dashboard
You may not see results immediately. Wait a minute or so and you should start seeing results. Lastly, if you run the Flink cluster with more than one task manager that node will be automatically discovered the discovery service setup in prometheus.
Here are a list links of useful links,
Description | Link |
---|---|
Prometheus - Useful for exploring the raw measuresments of a metric | http://localhost:9090 |
Grafana - Home of all the dashboard | http://localhost:9003 |
Job Manager Prometheus Exporter | http://localhost:9249 |
Task Manager Prometheus Exporter | Run docker ps . Identify containers mapped to 9249 |
- ZooNavigator - A web-based UI for ZooKeeper that you can start and stop using
make zoonav-start
andmake zoonav-stop
. Navigate to http://localhost:9001/ for access.
Service | Command(s) |
---|---|
Kafka Cluster | make kafka-start , make kafka-stop |
Kafka | make create-default-topics , make delete-default-topics , make create-topics topic='"topicA:2:3" "topicB:5:1"' , make delete-topics topic='"topicA" "topicB"' |
Flink Cluster | make flink-start , make flink-start NUM_TASK_SLOTS=2 NUM_TASK_MANAGERS=3 , make flink-stop |
Monitoring | make monitor-start , make monitor-stop |
ZooNavigator | make zoonav-start , make zoonav-stop |
- kcat (Formly KafkaCat) - a generic non-JVM producer and consumer for Apache Kafka >=0.8, think of it as a netcat for Kafka. In addition the Confluent Docs has helpful examples.
- Kafka UI - A feature rich UI on par with the Confluent UI to monitor and manage Kafka clusters.