What this book covers
This book is written in three parts that represent a natural flow in the data science process: data preparation, analysis, and presentation of results.
In Part 1, you will learn about extracting data from different sources and cleaning that data.
Chapter 1, Extract, Transform, and Load, begins your journey with the ETL process by extracting data from multiple sources, transforming the data to fit analysis plans, and loading the transformed data into business systems for analysis.
Chapter 2, Data Cleaning, leads you through a four-step cleaning process applicable to many types of datasets. You will learn how to summarize, fix, convert, and adapt data in preparation for your analysis process.
In Part 2, you will look at data exploration, predictive models, and cluster analysis for business intelligence, as well as how to forecast time series data.
Chapter 3, Exploratory Data Analysis, continues the adventure by exploring an unfamiliar dataset using a structured approach. This will provide you insights about features important for shaping further analysis.
Chapter 4, Linear Regression for Business, (co-authored with Rick Jones) walks you through a classic predictive analysis approach for single and multiple features. It also reinforces key assumptions the data should meet in order to use this analytic technique.
Chapter 5, Data Mining with Cluster Analysis, presents two methods of unsupervised learning with examples using k-means and hierarchical clustering. These two data mining techniques allow you to unearth patterns hidden in the data.
Chapter 6, Time Series Analysis, introduces a difficult topic not often taught in data science courses. You will explore non-machine learning methods to forecast future values with data that are dependent on past observations.
Finally, in Part 3 you will learn to communicate results with sharp visualizations and interactive, web-based dashboards.
Chapter 7, Visualizing the Data’s Story, explores more than just techniques to interactively visualize your results. You will learn how your audience cognitively interprets data through color, shape, and position.
Chapter 8, Web Dashboards with Shiny, (authored by Steven Mortimer) culminates your adventure by explaining how to create a web-based, business intelligence application using R Shiny.
There are a number of appendices providing additional information and code.
Appendix A, References, provides a list of the references used throughout the book.
Appendix B, Other Helpful R Functions, provides a list of functions and their descriptions. These are useful in data science projects and this appendix allows you to explore how they may help you work with data.
Appendix C, R Packages Used in the Book, gives a complete list of all the R packages used in each chapter. This allows you to install all the packages you will need by referring to a single list. It also contains instructions on installing packages.
Appendix D, R Code for Supporting Market Segment Business Case Calculations, gives a detailed code base for computing the geo-based information used in Chapter 5, Data Mining with Cluster Analysis.
After completing the use cases, you will be able to work with business data in the R programming environment and realize how data science helps make informed decisions when developing business strategy. Along the way, you will find helpful tips about R and business intelligence.