Big Data

Exchanging Data between MySQL and Hadoop Using Sqoop Import and Export

Robotic and human hands nearly touching
Photo: Maximilian Wittmann / Unsplash · Royalty-free
  • The distributed Hadoop file system can not only retrieve data from flat files but also my structured as well as unstructured sources.
  • Today we will exchange( import and export both) data between MySQL and Hadoop.
  • Step 1 – First load data into MySQL from some external source
    • Connect to the Ambari Sandbox using ‘maria_dev’ credentials
    • Now use the wget command to download a SQL script which creates a database and puts data into respective tables
    • wget https://s3.amazonaws.com/testbucket786786/movielens.sql
    • Capture
      movielens.sql downloaded from https://s3.amazonaws.com/testbucket786786
    • in Ambari MySQL in already pre-installed, if not you might need to install MySQL first. The credentials of pre-installed MySQL in Ambari/HortonWorks is root with password hadoop.
    • Let’s connect to MySQL and create a ‘movielens’ database and run our SQL script to create all the tables and data for this exercise.
    • Capture
      mysql -u root -p command prompts you to enter password to connect to MySQL then you check the list of databases using show databases command. Now create a database movielens using command create database movielens and finally switch to that database using command use movielens.
       
    • Now lets execute the downloaded SQL script to generate the data.
    • Capture
      You should set the names and character set to UTF8 because it might contain UTF8 characters too. Use commands – set NAMES ‘UTF8’ and set CHARACTER SET UTF8. Then finally execute the downloaded SQL script to generate tables and data as source movielens.sql;
       
    • Verify that now SQL has data needed and run a SQL query to check for the MostRated Movie.
    • Capture
      show tables; command shows list of all tables in the database. describe movies; command show the schema of the table movies and so on.
       
    • Capture
      The query and the results

      select movies.title, count(ratings.movie_id) as ratingCount, avg(ratings.rating) as avgRatingselect movies.title, count(ratings.movie_id) as ratingCount, avg(ratings.rating) as avgRating    from movies    join ratings    on movies.id=ratings.movie_id    group by ratings.movie_id having ratingCount>10    order by avgrating DESC limit 10;

  • Step 2 : Import Data from MySQL to HDFS
    • We use Sqoop Import Command for importing data into HDFS
    • First Important step – grant all privileges on movielens database. Run this command on MySQL and then exit the MySQL shell using exit command
      • grant all privileges on movielens.* to ”@’localhost’;
    • Command breakdown is like this –
      • sqoop import –connect jdbc:mysql://localhost/movielens –driver com.mysql.jdbc.Driver –table moviestab -m 1
        • –connect jdbc:mysql://localhost/movielens species your MySQL database connection
        • –driver com.mysql.jdbc.Driver – species the driver
        • –table moviestab speciesfies which table to export
        • -m 1 – specifies that we want to use only one mapper. For large datasets we can use more.
        • Capture
          Scoop Import in action.
        • Capture
          Wow…result is success with 1682 rows imported
        • Once succeeded we will log into Ambari Files view to see if files have been loaded or not.
        • Capture
          Great, we imported data from MySQL to HDFS
  • Step 3 : Import Data from MySQL to Hive
    • lets see how can we import directly into HIVE.
    • sqoop import –connect jdbc:mysql://localhost/movielens –driver com.mysql.jdbc.Driver –table movies -m 1 –hive-import
      Capture
      Only difference is that we added a flag –hive-import
    • Capture
      yay!!!, the data is imported directly to hive.
  • Step 4: Export Data from Hive to MySQL
    • lets do the opposite, export data from HDFS to MySQL
    • One pre-conditions is there that the table should be existing in MySQL as Sqoop export does not create tables at destination
    • Use this command to create a new empty table from movies table
      • create table moviestab as select * from movies where 1=0
      • this will create a table moviestab which is empty as condition 1=0 does not return any rows, but definitely copies the structure.
      • Capture
      • The command to export data is :
      •  sqoop export –connect jdbc:mysql://localhost/movielens –driver com.mysql.jdbc.Driver –table moviestab -m 1 –export-dir /apps/hive/warehouse/movies –input-fields-terminated-by ‘\0001’
      • See the extra flags –export-dir /apps/hive/warehouse/movies specifies the source and  –input-fields-terminated-by ‘\0001’ 
      • lets execute and get results.
      • Capture
        Sqoop export in action
         
      • Capture
        yay!!!, the scoop export succeeded. Lets now check from MySQL.
         
      • Capture
        Yay!!!, we succeeded in exporting data from HDFS back to SQL into a newly created empty table movietab and select 5 rows to verify.
         
      • Hope this all helped you.