Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Hive Essentials

You're reading from   Apache Hive Essentials Essential techniques to help you process, and get unique insights from, big data

Arrow left icon
Product type Paperback
Published in Jun 2018
Publisher Packt
ISBN-13 9781788995092
Length 210 pages
Edition 2nd Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Dayong Du Dayong Du
Author Profile Icon Dayong Du
Dayong Du
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Overview of Big Data and Hive FREE CHAPTER 2. Setting Up the Hive Environment 3. Data Definition and Description 4. Data Correlation and Scope 5. Data Manipulation 6. Data Aggregation and Sampling 7. Performance Considerations 8. Extensibility Considerations 9. Security Considerations 10. Working with Other Tools 11. Other Books You May Enjoy

The relational and NoSQL databases versus Hadoop

To better understand the differences among the relational database, NoSQL database, and Hadoop, let's compare them with ways of traveling. You will be surprised to find that they have many similarities. When people travel, they either take cars or airplanes, depending on the travel distance and cost. For example, when you travel to Vancouver from Toronto, an airplane is always the first choice in terms of the travel time versus cost. When you travel to Niagara Falls from Toronto, a car is always a good choice. When you travel to Montreal from Toronto, some people may prefer taking a car to an airplane. The distance and cost here are like the big data volume and investment. The traditional relational database is like the car, and the Hadoop big data tool is like the airplane. When you deal with a small amount of data (short distance), a relational database (like the car) is always the best choice, since it is fast and agile to deal with a small or moderate amount of data. When you deal with a big amount of data (long distance), Hadoop (like the airplane) is the best choice, since it is more linear-scalable, fast, and stable to deal with the big volume of data. You could drive from Toronto to Vancouver, but it takes too much time. You can also take an airplane from Toronto to Niagara Falls, but it would take more time on your way to the airport and cost more than traveling by car. In addition, you could take a ship or a train. This is like a NoSQL database, which offers characteristics and balance from both a relational database and Hadoop in terms of good performance and various data format support for moderate to large amounts of data.

You have been reading a chapter from
Apache Hive Essentials - Second Edition
Published in: Jun 2018
Publisher: Packt
ISBN-13: 9781788995092
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime