All Notes

Big Data

Big Data Notes

Other Hadoop Technologies
Motherboard and componentsPhoto: Alexandre Debiève / Unsplash · Royalty-free Big Data
Mohd Naeem

Other Hadoop Technologies

Data note · The list is quite big but quite a few are noteworthy to be mentioned: Impala:  Cloudera's alternative Hortonwork's Hive Faster than Hive…

Read Note
Apache Flink – Highly Scalable Streaming Engine
Customer success review meetingPhoto: Lucas / Unsplash · Royalty-free Big Data
Mohd Naeem

Apache Flink – Highly Scalable Streaming Engine

Data note · Why Flink: more scalable than Storm upto more than 1000s of nodes( massive scale) more fault tolerant than Storm maintain "state snapshots"…

Read Note
Spark Streaming – Processing Data in Almost Real Time
Executive KPI dashboard on monitorsPhoto: Carlos Muza / Unsplash · Royalty-free Big Data
Mohd Naeem

Spark Streaming – Processing Data in Almost Real Time

Data note · Why process big data in real time? Big data is really huge, so if we still use batch processing ( E.g. running…

Read Note
Apache Kafka – A Tool for Streaming Data into the Cluster
Data visualization on a displayPhoto: Luke Chesser / Unsplash · Royalty-free Big Data
Mohd Naeem

Apache Kafka – A Tool for Streaming Data into the Cluster

Data note ·   What is Streaming?  So what if you have to capture live data or logs from a web servers, you have data…

Read Note
Apache Flume – Hadoop Specific Streaming Tool
Executive KPI dashboard on monitorsPhoto: Carlos Muza / Unsplash · Royalty-free Big Data
Mohd Naeem

Apache Flume – Hadoop Specific Streaming Tool

Data note · What is Apache Flume: As we know that Apache Kafka is a generic streaming tool which can handle not only Hadoop specific…

Read Note