Apache Flink – Highly Scalable Streaming Engine

Why Flink:
- more scalable than Storm upto more than 1000s of nodes( massive scale)
- more fault tolerant than Storm
  - maintain “state snapshots” to guarantee exactly once processing
- In similarity, it is also based on event based streaming like Storm
Flink Vs Spark Streaming Vs Storm:
- Faster than Storm
- real time streaming like Storm
- event based like storm and maintains windowing
- exactly once guarantee is good for financial applications
Flink Architecture:
- Flink can sit on the top of a local or standalone or a cluster based Hadoop system or even on the cloud on AWS or Google Cloud
- Flink has 2 distinct set of APOs – Data Streaming APIs( more like Storm) as well as Data Set APIs( more like Spark Streaming)
- In its each universe –
  - Data Streaming APIs support Event based processing as well as SQL/Table based processing.
  - Data Set APIs support FlinkML, Gelly and SQL/Table based processing
- Connectors for Flink – HDFS, Kafka, Cassandra, RabbitMQ, Redis, Elastic search, NIFI etc
Lets play with Flink:
- Login to the console using maria_dev credentials and elevate to root permissions.
- Download Apache Flink from apache.flink.org. Download using this link – wget https://archive.apache.org/dist/flink/flink-1.2.0/flink-1.2.0-bin-hadoop27-scala_2.10.tgz
- Unzip the downloadable – tar xvf flink-1.2.0-bin-hadoop27-scala_2.10.tgz
- Move to the flink folder – cd flink-1.2.0
- Now first lets configure the config files – cd conf
- Edit – nano flink-conf.yaml
  - jobmanager.web.port: 8081 to jobmanager.web.port: 8082
- Now lets start flink – go to flink’s bin folder and run
  - ./start-local.sh
- Now lets open the Flink WEb UI – http://127.0.0.1:8082
- Now lets run one example from GitHub location – wget https://raw.githubusercontent.com/apache/flink/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/socket/SocketWindowWordCount.scala
- here is the heart of the code.
  - You first get the streaming env –
    - // get the execution environment
      val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
  - Then you process and transform the data –
    - // parse the data, group it, window it, and aggregate the counts
      val windowCounts = text
      .flatMap { w => w.split(“\\s”) }
      .map { w => WordWithCount(w, 1) }
      .keyBy(“word”)
      .timeWindow(Time.seconds(5))
      .sum(“count”)
- Now lets open up a port and listen on port 9000 – nc -l 9000
- Now run the example –
  - go to /home/maria_dev/cd flink-1.2.0
  - now run – ./bin/flink run examples/streaming/SocketWindowWordCount.jar –port 9000
  - Output –