Preface
We live in times of Internet of Things—a large, world-wide network of interconnected devices, sensors, applications, environments, and interfaces. They generate, exchange, and consume massive amounts of data on a daily basis, and the ability to harness these huge quantities of information can provide us with novel understanding of physical and social phenomena.
The recent rapid growth of various open source and proprietary big data technologies allows deep exploration of these vast amounts of data. However, many of them are limited in terms of their statistical and data analytics capabilities. Some others implement techniques and programming languages that many classically educated statisticians and data analysts are simply unfamiliar with and find them difficult to apply in real-world scenarios.
R programming language—an open source, free, extremely versatile statistical environment, has a potential to fill this gap by providing users with a large variety of highly optimized data processing methods, aggregations, statistical tests, and machine learning algorithms with a relatively user-friendly and easily customizable syntax.
This book challenges traditional preconceptions about R as a programming language that does not support big data processing and analytics. Throughout the chapters of this book, you will be exposed to a variety of core R functions and a large array of actively maintained third-party packages that enable R users to benefit from most recent cutting-edge big data technologies and frameworks, such as Hadoop, Spark, H2O, traditional SQL-based databases, such as SQLite, MariaDB, and PostgreSQL, and more flexible NoSQL databases, such as MongoDB or HBase, to mention just a few. By following the exercises and tutorials contained within this book, you will experience firsthand how all these tools can be integrated with R throughout all the stages of the Big Data Product Cycle, from data import and data management to advanced analytics and predictive modeling.