Big data analytics constitutes a wide range of functions related to mining, analysis, and predictive modeling on large-scale datasets. The rapid growth of information and technological developments has provided a unique opportunity for individuals and enterprises across the world to derive profits and develop new capabilities redefining traditional business models using large-scale analytics. This chapter aims at providing a gentle overview of the salient characteristics of big data to form a foundation for subsequent chapters that will delve deeper into the various aspects of big data analytics.
In general, this book will provide both theoretical as well as practical hands-on experience with big data analytics systems used across the industry. The book begins with a discussion Big Data and Big Data related platforms such as Hadoop, Spark and NoSQL Systems, followed by Machine Learning where both practical and theoretical topics will be covered and conclude with a thorough analysis of the use of Big Data and more generally, Data Science in the industry. The book will be inclusive of the following topics:
- Big data platforms: Hadoop ecosystem and Spark NoSQL databases such as Cassandra Advanced platforms such as KDB+
- Machine learning: Basic algorithms and concepts Using R and scikit-learn in Python Advanced tools in C/C++ and Unix Real-world machine learning with neural networks Big data infrastructure
- Enterprise cloud architecture with AWS (Amazon Web Services) On-premises enterprise architectures High-performance computing for advanced analytics Business and enterprise use cases for big data analytics and machine learning Building a world-class big data analytics solution
To take the discussion forward, we will have the following concepts cleared in this chapter:
- Definition of Big Data
- Why are we talking about Big Data now if data has always existed?
- A brief history of Big Data
- Types of Big Data
- Where should you start your search for the Big Data solution?