Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
MySQL 8 for Big Data

You're reading from   MySQL 8 for Big Data Effective data processing with MySQL 8, Hadoop, NoSQL APIs, and other Big Data tools

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781788397186
Length 296 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Authors (4):
Arrow left icon
Chintan Mehta Chintan Mehta
Author Profile Icon Chintan Mehta
Chintan Mehta
Shabbir Challawala Shabbir Challawala
Author Profile Icon Shabbir Challawala
Shabbir Challawala
Jaydip Lakhatariya Jaydip Lakhatariya
Author Profile Icon Jaydip Lakhatariya
Jaydip Lakhatariya
Kandarp Patel Kandarp Patel
Author Profile Icon Kandarp Patel
Kandarp Patel
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Introduction to Big Data and MySQL 8 FREE CHAPTER 2. Data Query Techniques in MySQL 8 3. Indexing your data for High-Performing Queries 4. Using Memcached with MySQL 8 5. Partitioning High Volume Data 6. Replication for building highly available solutions 7. MySQL 8 Best Practices 8. NoSQL API for Integrating with Big Data Solutions 9. Case study: Part I - Apache Sqoop for exchanging data between MySQL and Hadoop 10. Case study: Part II - Real time event processing using MySQL applier

Evolution of MySQL for Big Data

Most enterprises have used MySQL as a relational database for many decades. There is a large amount of data stored, which is used either for transactions or analysis on the data that is collected and generated, and this is where Big Data analytic tools need to be implemented. This is now possible with MySQL integration with Hadoop. Using Hadoop, data can be stored in a distributed storage engine and you can also implement the Hadoop cluster for the distributed analytical engine for Big Data analytics. Hadoop is most preferred for its massive parallel processing and powerful computation. With the combination of MySQL and Hadoop, it is now possible to have real-time analytics where Hadoop can store the data and work in parallel with MySQL to show the end results in real time; this helps address many use cases like GIS information, which has been explained in the Introducing MySQL 8 section of this chapter. We have seen the Big Data life cycle previously where data can be transformed to generate analytic results. Let's see how MySQL fits in to the life cycle.

The following diagram illustrates how MySQL 8 is mapped to each of the four stages of the Big Data life cycle:

Acquiring data in MySQL

With the volume and velocity of data, it becomes difficult to transfer data in MySQL with optimal performance. To avoid this, Oracle has developed the NoSQL API to store data in the InnoDB storage engine. This will not do any kind of SQL parsing and optimization, hence, key/value data can be directly written to the MySQL tables with high speed transaction responses without sacrificing ACID guarantees. The MySQL cluster also supports different NoSQL APIs for Node.js, Java, JPA, HTTP/REST, and C++. We will explore this in detail later in the book, however, we need to keep in mind that using the NoSQL API, we can enable the faster processing of data and transactions in MySQL.

Organizing data in Hadoop

The next step is to organize data in the Hadoop filesystem once the data has been acquired and loaded to MySQL. Big Data requires some processing to produce analysis results where Hadoop is used to perform highly parallel processing. Hadoop is also a highly scalable distributed framework and is powerful in terms of computation. Here, the data is consolidated from different sources to process the analysis. To transfer the data between MySQL tables to HDFS, Apache Sqoop will be leveraged.

Analyzing data

Now it's time for analyzing data! This is the phase where MySQL data will be processed using the map reduce algorithm of Hadoop. We can use other analysis tools such as Apache Hive or Apache Pig to do similar analytical results. We can also perform custom analysis that can be executed on Hadoop, which returns the results set with the data analyzed and processed.

Results of analysis

The results that were analyzed from our previous phases are loaded back into MySQL, which can be done with the help of Apache Sqoop. Now MySQL has the analysis result that can be consumed by business intelligence tools such as Oracle BI Solution, Jasper Soft, Talend, and so on or other traditional ways using web applications that can generate various analytical reports and, if required, do real-time processing.

This is how MySQL fits easily into a Big Data solution. This architecture makes structured databases handle the Big Data analysis. To understand how to achieve this, refer to Chapter 9, Case study: Part I - Apache Sqoop for Exchanging Data between MySQL and Hadoop, and Chapter 10, Case study: Part II - Realtime event processing using MySQL applier, which cover a couple of real-world use cases where we discuss using MySQL 8 extensively and solving business problems to generate value from data.

You have been reading a chapter from
MySQL 8 for Big Data
Published in: Oct 2017
Publisher: Packt
ISBN-13: 9781788397186
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image