Apache Spark – A Deep Dive – Series 3 of N – Using Filters on RDD
AI usage metrics on a dashboardPhoto: Luke Chesser / Unsplash · Royalty-free Big Data - Advanced
Mohd Naeem

Apache Spark – A Deep Dive – Series 3 of N – Using Filters on RDD

Data note · Notes: The data set for this exercise is from National Centers for Environmental Information (NCEI)  at http://www.ncdc.noaa.gov/data-access/quick-links Click the link Global Historical Climatology…

Read Note
Install Spark on Linux or Windows as Standalone Setup without Hadoop Ecosystem
BYOK setup on a developer laptopPhoto: Christina @ wocintechchat.com / Unsplash · Royalty-free Big Data - Advanced
Mohd Naeem

Install Spark on Linux or Windows as Standalone Setup without Hadoop Ecosystem

Data note · Windows Install JDK (Java Development Kit)  Visit Java site - http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Select your environment ( Windows x86 or x64) Accept license and download…

Read Note
Apache Spark – A Deep Dive – Series 1 of N – Single Field Based RDDS
Operators pairing on an AI workflowPhoto: LinkedIn Sales Navigator / Unsplash · Royalty-free Big Data - Advanced
Mohd Naeem

Apache Spark – A Deep Dive – Series 1 of N – Single Field Based RDDS

Data note · What is Apache Spark: a data processing engine much faster than Map-Reduce uses DAG(Directed Acyclic Graphs) to optimize the workflows. How does Apache…

Read Note
Other Hadoop Technologies
Motherboard and componentsPhoto: Alexandre Debiève / Unsplash · Royalty-free Big Data
Mohd Naeem

Other Hadoop Technologies

Data note · The list is quite big but quite a few are noteworthy to be mentioned: Impala:  Cloudera's alternative Hortonwork's Hive Faster than Hive…

Read Note
Apache Flink – Highly Scalable Streaming Engine
Customer success review meetingPhoto: Lucas / Unsplash · Royalty-free Big Data
Mohd Naeem

Apache Flink – Highly Scalable Streaming Engine

Data note · Why Flink: more scalable than Storm upto more than 1000s of nodes( massive scale) more fault tolerant than Storm maintain "state snapshots"…

Read Note
Spark Streaming – Processing Data in Almost Real Time
Executive KPI dashboard on monitorsPhoto: Carlos Muza / Unsplash · Royalty-free Big Data
Mohd Naeem

Spark Streaming – Processing Data in Almost Real Time

Data note · Why process big data in real time? Big data is really huge, so if we still use batch processing ( E.g. running…

Read Note