2019 – Stack Signals

Photo: Markus Spiske / Unsplash · Royalty-free Big Data

December 18, 2019 Mohd Naeem

Big Data Integration with MongoDB Using Spark

Data note · Why MongoDB? : Lets evaluate MongoDB on CAP theorem to assert 'Why MongoDB' Partition tolerance is a MUST in Bigdata scenarios as…

Read Note

Photo: Christina @ wocintechchat.com / Unsplash · Royalty-free Big Data

November 14, 2019 Mohd Naeem

Big Data Integration with Cassandra Using Spark

Data note · Why Cassandra: Before we discuss Cassandara, we have to also discuss about something called as CAP Theorem - As per CAP(Consistency, Availability…

Read Note

Photo: Danial Iglesias / Unsplash · Royalty-free Big Data

October 12, 2019 Mohd Naeem

How to Interact with HDFS Using HBase and Pig

Data note · Interacting with HDFS using HBase and Python was very powerful but it was also very engaging as we havd to do a…

Read Note

Photo: Growtika / Unsplash · Royalty-free Big Data

September 10, 2019 Mohd Naeem

How to Interact with HDFS Using HBase and Python

Data note · What is HBase: HBase is a NoSQL/non-relational answer your big data queries where relational databases can't be as scalable as non relational…

Read Note

Photo: Maximilian Wittmann / Unsplash · Royalty-free Big Data

August 9, 2019 Mohd Naeem

Exchanging Data between MySQL and Hadoop Using Sqoop Import and Export

Data note · The distributed Hadoop file system can not only retrieve data from flat files but also my structured as well as unstructured sources.…

Read Note

Photo: Luke Chesser / Unsplash · Royalty-free Big Data

July 3, 2019 Mohd Naeem

How to Process Data to Recommend Movies for a Specific User( Using Machine Learning and Spark2)

Editorial reprint · As Spark 2 supports datasets which is the extension of RDDs, we can use these datasets to model into a Machine Learning…

Read Note

Photo: Annie Spratt / Unsplash · Royalty-free Big Data

June 4, 2019 Mohd Naeem

How to Process Data Using Spark 2

Data note · Spark 2 extends the RDDs(Resilient Distributed Dataset) in terms of a "DataFrame" Dataframe contains Row Objects thus give you power to use…

Read Note

Photo: Christina @ wocintechchat.com / Unsplash · Royalty-free Big Data

May 9, 2019 Mohd Naeem

How to Process Data Using Spark

Data note · Apache Spark is super lightening fast Hadoop distributed processing service. Its execute in-memory that's why it is the fastest of all processing…

Read Note

Photo: Alexandre Debiève / Unsplash · Royalty-free Big Data

April 27, 2019 Mohd Naeem

How to Directly Use MapReduce to Process Data

Data note · As we know that the core of the Hadoop's distributed processing system is MapReduce. We can use on the top technologies like…

Read Note

Photo: Alexandre Debiève / Unsplash · Royalty-free Big Data

March 27, 2019 Mohd Naeem

How to Process Data with Pig with MapReduce and TEZ

Data note · As we know that HDFS is the distributed storage system of Hadoop. Similarly MapReduce is the core processing engine. Recently TEZ is…

Read Note

2019 Notes

Big Data Integration with MongoDB Using Spark

Big Data Integration with Cassandra Using Spark

How to Interact with HDFS Using HBase and Pig

How to Interact with HDFS Using HBase and Python

Exchanging Data between MySQL and Hadoop Using Sqoop Import and Export

How to Process Data to Recommend Movies for a Specific User( Using Machine Learning and Spark2)

How to Process Data Using Spark 2

How to Process Data Using Spark

How to Directly Use MapReduce to Process Data

How to Process Data with Pig with MapReduce and TEZ