You're reading from MySQL 8 for Big Data Effective data processing with MySQL 8, Hadoop, NoSQL APIs, and other Big Data tools

Product type Paperback

Published in Oct 2017

Publisher Packt

ISBN-13 9781788397186

Length 296 pages

Edition 1st Edition

Languages

SQL

Tools

Hadoop

Concepts

Big Data

Authors (4):

Chintan Mehta

Shabbir Challawala

Jaydip Lakhatariya

Kandarp Patel

View More author details

Evolution of MySQL for Big Data

Most enterprises have used MySQL as a relational database for many decades. There is a large amount of data stored, which is used either for transactions or analysis on the data that is collected and generated, and this is where Big Data analytic tools need to be implemented. This is now possible with MySQL integration with Hadoop. Using Hadoop, data can be stored in a distributed storage engine and you can also implement the Hadoop cluster for the distributed analytical engine for Big Data analytics. Hadoop is most preferred for its massive parallel processing and powerful computation. With the combination of MySQL and Hadoop, it is now possible to have real-time analytics where Hadoop can store the data and work in parallel with MySQL to show the end results in real time; this helps address many use cases like GIS information, which has been explained in the Introducing MySQL 8 section of this chapter. We have seen the Big Data life cycle previously where data can be transformed to generate analytic results. Let's see how MySQL fits in to the life cycle.

The following diagram illustrates how MySQL 8 is mapped to each of the four stages of the Big Data life cycle:

Acquiring data in MySQL

With the volume and velocity of data, it becomes difficult to transfer data in MySQL with optimal performance. To avoid this, Oracle has developed the NoSQL API to store data in the InnoDB storage engine. This will not do any kind of SQL parsing and optimization, hence, key/value data can be directly written to the MySQL tables with high speed transaction responses without sacrificing ACID guarantees. The MySQL cluster also supports different NoSQL APIs for Node.js, Java, JPA, HTTP/REST, and C++. We will explore this in detail later in the book, however, we need to keep in mind that using the NoSQL API, we can enable the faster processing of data and transactions in MySQL.

Organizing data in Hadoop

The next step is to organize data in the Hadoop filesystem once the data has been acquired and loaded to MySQL. Big Data requires some processing to produce analysis results where Hadoop is used to perform highly parallel processing. Hadoop is also a highly scalable distributed framework and is powerful in terms of computation. Here, the data is consolidated from different sources to process the analysis. To transfer the data between MySQL tables to HDFS, Apache Sqoop will be leveraged.

Analyzing data

Now it's time for analyzing data! This is the phase where MySQL data will be processed using the map reduce algorithm of Hadoop. We can use other analysis tools such as Apache Hive or Apache Pig to do similar analytical results. We can also perform custom analysis that can be executed on Hadoop, which returns the results set with the data analyzed and processed.

Results of analysis

The results that were analyzed from our previous phases are loaded back into MySQL, which can be done with the help of Apache Sqoop. Now MySQL has the analysis result that can be consumed by business intelligence tools such as Oracle BI Solution, Jasper Soft, Talend, and so on or other traditional ways using web applications that can generate various analytical reports and, if required, do real-time processing.

This is how MySQL fits easily into a Big Data solution. This architecture makes structured databases handle the Big Data analysis. To understand how to achieve this, refer to Chapter 9, Case study: Part I - Apache Sqoop for Exchanging Data between MySQL and Hadoop, and Chapter 10, Case study: Part II - Realtime event processing using MySQL applier, which cover a couple of real-world use cases where we discuss using MySQL 8 extensively and solving business problems to generate value from data.

You're reading from MySQL 8 for Big Data Effective data processing with MySQL 8, Hadoop, NoSQL APIs, and other Big Data tools

Table of Contents (11) Chapters

Evolution of MySQL for Big Data

Acquiring data in MySQL

Organizing data in Hadoop

Analyzing data

Results of analysis

Authors (4)

Other recommended products

Personalised recommendations for you