You're probably used to hearing words such as big data, machine learning, and artificial intelligence in the news. It's amazing how many new applications of these terms appear every day. Recommender systems such as the ones used by Amazon, Netflix, search engines, stock market analysis, or even for speech recognition are only a few examples. Different new algorithms and new techniques emerge every year, and many of them are based on previous approaches or combine different existing algorithms. At the same time, there are more and more tutorials and courses focused on teaching them.
Many courses have a number of common limitations such as solving toy problems or focusing all of their attention on algorithms. These limitations could mean that you obtain an incorrect understanding of the data modeling approach. Thus, the modeling process entails important steps before, as business and data understanding, and data preparation. Without these previous steps, it isn't guaranteed that the model will be applied without flaws in the future. Furthermore, model development does not finish after finding an appropriate algorithm. The performance evaluation of the model, its interpretability, and the model's deployment are also very relevant and the culmination of the modeling process.
In this book, we will learn how to develop different predictive models. The applications or examples included in this book have been based on the financial sector, and will also try to create a theoretical framework that helps you understand the causes of the financial crisis, which had a dramatic impact on countries around the world.
All of the algorithms and techniques that are used in this book will be applied using the R language. Nowadays, R is one of the major languages for data science. There is an enormous debate about which language is better, R or Python. Both languages have many strengths and some weakness as well.
In my experience, R is more powerful for the analysis of financial data. I've found many R libraries that specialize in this field, but not so many in Python. Nevertheless, credit risk and financial information is very much related to the treatment of time series, which, at least in my opinion, performs better in Python. The use of recurrent or Long Short-Term Memory (LSTM) networks are better implemented in Python as well. However, R provides more powerful libraries for data visualization and interactive style. It is recommended that you use both R and Python interchangeably, depending on your project. Good resources on machine learning with Python are available at Packt, some of which are listed here for your convenience:
- Python Machine Learning – Second Edition, https://www.packtpub.com/big-data-and-business-intelligence/python-machine-learning-second-edition
- Hands-On Data Science and Python Machine Learning, https://www.packtpub.com/big-data-and-business-intelligence/hands-data-science-and-python-machine-learning
- Python Machine Learning By Example, https://www.packtpub.com/big-data-and-business-intelligence/python-machine-learning-example
In this chapter, let's revive your knowledge on machine learning and get you started with coding using R.
The following topics will be covered in this chapter:
- R and RStudio installation
- Some basics commands
- Objects, special cases, and basic operators in R
- Controlling code flow
- All about R packages
- Taking further steps