Preface
R and Python are interchangeably required languages these days for anybody engaged with data analysis. The growth of these two languages and their inter-dependency creates a natural requirement to learn them both. Thus, it was natural where the second edition of my previous title R Statistical Application Development by Example was headed. I thus took this opportunity to add Python as an important layer and hence you would find Doing it in Python spread across and throughout the book. Now, the book is useful on many fronts, those who need to learn both the languages, uses R and needs to switch to Python, and vice versa. While abstract development of ideas and algorithms have been retained in R only, standard and more commonly required data analysis technique are available in both the languages now. The only reason for not providing the Python parallel is to avoid the book from becoming too bulky.
The open source language R is fast becoming one of the preferred companions for statistics, even as the subject continues to add many friends in machine learning, data mining, and so on among its already rich scientific network. The era of mathematical theory and statistical application embeddedness is truly a remarkable one for society and R and Python has played a very pivotal role in it. This book is a humble attempt at presenting statistical models through R for any reader who has a bit of familiarity with the subject. In my experience of practicing the subject with colleagues and friends from different backgrounds, I realized that many are interested in learning the subject and applying it in their domain which enables them to take appropriate decisions in analyses, which involves uncertainty. A decade earlier my friends would have been content with being pointed to a useful reference book. Not so anymore! The work in almost every domain is done through computers and naturally they do have their data available in spreadsheets, databases, and sometimes in plain text format. The request for an appropriate statistical model is invariantly followed by a one word question software? My answer to them has always been a single letter reply R! Why? It is really a very simple decision and it has been my companion over the last seven years. In this book, this experience has been converted into detailed chapters and a cleaner breakup of model building in R.
A by-product of my interactions with colleagues and friends who are all aspiring statistical model builders has been that I have been able to pick up the trough of their learning curve of the subject. The first attempt towards fixing the hurdle has been to introduce the fundamental concepts that the beginners are most familiar with, which is data. The difference is simply in the subtleties and as such I firmly believe that introducing the subject on their turf motivates the reader for a long way in their journey. As with most statistical software, R provides modules and packages which mostly cover many of the recently invented statistical methodologies. The first five chapters of the book focus on the fundamental aspects of the subject and the R language and therefore hence cover R basics, data visualization, exploratory data analysis, and statistical inference.
The foundational aspects are illustrated using interesting examples and sets up the framework for the next five chapters. Linear and logistic regression models being at the forefront, are of paramount importance in applications. The discussion is more generic in nature and the techniques can be easily adapted across different domains. The last two chapters have been inspired by the Breiman school and hence the modern method of using classification and regression trees has been developed in detail and illustrated through a practical dataset.