All Notes

2024

2024 Notes

Apache Spark – A Deep Dive – Series 5 of N – Using Broadcasting
Robotic and human hands nearly touchingPhoto: Maximilian Wittmann / Unsplash · Royalty-free Big Data - Advanced
Mohd Naeem

Apache Spark – A Deep Dive – Series 5 of N – Using Broadcasting

Data note · Why Broadcasting? To understand broadcasting let's first do an exercise: Exercise - Find the list of most popular movies from Movielens data…

Read Note
Apache Spark – A Deep Dive – Series 3 of N – Using Filters on RDD
AI usage metrics on a dashboardPhoto: Luke Chesser / Unsplash · Royalty-free Big Data - Advanced
Mohd Naeem

Apache Spark – A Deep Dive – Series 3 of N – Using Filters on RDD

Data note · Notes: The data set for this exercise is from National Centers for Environmental Information (NCEI)  at http://www.ncdc.noaa.gov/data-access/quick-links Click the link Global Historical Climatology…

Read Note
Install Spark on Linux or Windows as Standalone Setup without Hadoop Ecosystem
BYOK setup on a developer laptopPhoto: Christina @ wocintechchat.com / Unsplash · Royalty-free Big Data - Advanced
Mohd Naeem

Install Spark on Linux or Windows as Standalone Setup without Hadoop Ecosystem

Data note · Windows Install JDK (Java Development Kit)  Visit Java site - http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Select your environment ( Windows x86 or x64) Accept license and download…

Read Note
Apache Spark – A Deep Dive – Series 1 of N – Single Field Based RDDS
Operators pairing on an AI workflowPhoto: LinkedIn Sales Navigator / Unsplash · Royalty-free Big Data - Advanced
Mohd Naeem

Apache Spark – A Deep Dive – Series 1 of N – Single Field Based RDDS

Data note · What is Apache Spark: a data processing engine much faster than Map-Reduce uses DAG(Directed Acyclic Graphs) to optimize the workflows. How does Apache…

Read Note
Other Hadoop Technologies
Motherboard and componentsPhoto: Alexandre Debiève / Unsplash · Royalty-free Big Data
Mohd Naeem

Other Hadoop Technologies

Data note · The list is quite big but quite a few are noteworthy to be mentioned: Impala:  Cloudera's alternative Hortonwork's Hive Faster than Hive…

Read Note
Apache Flink – Highly Scalable Streaming Engine
Customer success review meetingPhoto: Lucas / Unsplash · Royalty-free Big Data
Mohd Naeem

Apache Flink – Highly Scalable Streaming Engine

Data note · Why Flink: more scalable than Storm upto more than 1000s of nodes( massive scale) more fault tolerant than Storm maintain "state snapshots"…

Read Note