Getting data into Hadoop
Now that we have put in all that up-front effort, let us look at ways of bringing the data out of MySQL and into Hadoop.
Using MySQL tools and manual import
The simplest way to export data into Hadoop is to use existing command-line tools and statements. To export an entire table (or indeed an entire database), MySQL offers the
mysqldump
utility. To do a more precise export, we can use a SELECT
statement of the following form:
SELECT col1, col2 from table INTO OUTFILE '/tmp/out.csv' FIELDS TERMINATED by ',', LINES TERMINATED BY '\n';
Once we have an export file, we can move it into HDFS using hadoop fs -put
or into Hive through the methods discussed in the previous chapter.
Have a go hero – exporting the employee table into HDFS
We don't want this chapter to turn
into a MySQL tutorial, so look up the syntax of the mysqldump
utility, and use it or the SELECT … INTO OUTFILE
statement to export the employee table into a tab-separated file you then copy onto HDFS.