Preface
Statistics is a discipline of study used for applying analytical methods to answer questions and solve problems using data, in both academic and industry settings. Many methods have been around for centuries, while others are much more recent. Statistical analysis and results are fairly straightforward for presenting to both technical and non-technical audiences. Furthermore, producing results with statistical analysis does not necessarily require large amounts of data or compute resources and can be done fairly quickly, especially when using programming languages such as Python, which is moderately easy to work with and implement.
While artificial intelligence (AI) and advanced machine learning (ML) tools have become more prominent and popular over recent years with the increase of accessibility in compute power, performing statistical analysis as a precursor to developing larger-scale projects using AI and ML can enable a practitioner to assess feasibility and practicality before using larger compute resources and project architecture development for those types of projects.
This book provides a wide variety of tools that are commonly used to test hypotheses and provide basic predictive capabilities to analysts and data scientists alike. The reader will walk through the basic concepts and terminology required for understanding the statistical tools in this book prior to exploring the different tests and conditions under which they are applicable. Further, the reader will gain knowledge for assessing the performance of the tests. Throughout, examples will be provided in the Python programming language to get readers started understanding their data using the tools presented, which will be applicable to some of the most common questions faced in the data analytics industry. The topics we will walk through include:
- An introduction to statistics
- Regression models
- Classification models
- Time series models
- Survival analysis
Understanding the tools provided in these sections will provide the reader with a firm foundation from which further independent growth in the statistics domain can more easily be achieved.