Preface
Big data, the Internet of Things, and artificial intelligence have become the hottest technology buzzwords in recent years. Although there are many different terms used to define these technologies, the common concept is that they're all driven by data. Simply having data is not enough; being able to unlock its value is essential. Therefore, data scientists have begun to focus on how to gain insights from raw data.
Data science has become one of the most popular subjects among academic and industry groups. However, as data science is a very broad discipline, learning how to master it can be challenging. A beginner must learn how to prepare, process, aggregate, and visualize data. More advanced techniques involve machine learning, mining various data formats (text, image, and video), and, most importantly, using data to generate business value. The role of a data scientist is challenging and requires a great deal of effort. A successful data scientist requires a useful tool to help solve day-to-day problems.
In this field, the most widely used tool by data scientists is the R language, which is open source and free. Being a machine language, it provides many data processes, learning packages, and visualization functions, allowing users to analyze data on the fly. R helps users quickly perform analysis and execute machine learning algorithms on their dataset without knowing every detail of the sophisticated mathematical models.
R for Data Science Cookbook takes a practical approach to teaching you how to put data science into practice with R. The book has 12 chapters, each of which is introduced by breaking down the topic into several simple recipes. Through the step-by-step instructions in each recipe, you can apply what you have learned from the book by using a variety of packages in R.
The first section of this book deals with how to create R functions to avoid unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL operations for heterogeneous data sources with R packages. An example of data manipulation is provided that illustrates how to use the dplyr
and data.table
packages to process larger data structures efficiently, while there is a section focusing on ggplot2
that covers how to create advanced figures for data exploration. Also, you will learn how to build an interactive report using the ggvis
package.
This book also explains how to use data mining to discover items that are frequently purchased together. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, and dimension reduction.
With R for Data Science Cookbook in hand, I can assure you that you will find data science has never been easier.