Exploring machine learning software
Before we start developing models, we will need to few tools to help us. The good news is that regardless of whether you are using a Mac, PC, or Linux, almost everything we will use is compatible with all platforms. There are three main items we will need to install: a language to develop our models in, a database to store our data in, and a cloud computing space to deploy our models in. Luckily for us, there is a fantastic technology stack ready to support our needs. We will be using the Python programming language to develop our models, MySQL to store our data, and AWS to run our cloud computing processes. Let's take a closer look at these three items.
Python (programming language)
Python is one of the most commonly used programming languages and sought-after skills in the data science industry today. It was first developed in 1991 and is regarded today as the most common language for data science. For this book, we will be using Python 3.7. There are several ways you can install Python on your computer. You can install the language in its standalone form from Python.org. This will provide you with a Python interpreter in its most basic form where you can run commands and execute scripts.
An alternative installation process that would install Python, pip (a package to help you install and manage Python libraries), and a collection of other useful libraries can be done by using Anaconda, which can be retrieved from anaconda.com. To have a working version of Python and its associated libraries on your computer as quickly as possible, using Anaconda is highly recommended. In addition to Python, we will need to install libraries to assist in a few areas. Think of libraries as nicely packaged portions of code that we can import and use as we see fit. Anaconda will, by default, install a few important libraries for us, but there will be others that we will need. We can install those on-the-go using pip. We will look at this in more detail in the next chapter. For the time being, go ahead and install Anaconda on your computer by navigating to the aforementioned website, downloading the installation that best matches your machine, and following the installation instructions provided.
MySQL (database)
When handling vast quantities of information, we will need a place to store and save all of our data throughout the analysis and preprocessing phases of our projects. For this, we will use MySQL, one of the most common relational databases used to store and retrieve data. We will take a closer look at the use of MySQL by using SQL. In addition to the MySQL relational database, we will also explore the use of DynamoDB, a non-relational and NoSQL database that has gained quite a bit of popularity in recent years. Don't worry about getting these setups right now – we will talk about getting them set up later on.
AWS and GCP (Cloud Computing)
Finally, after developing our machine learning models in Python and training them using the data in our databases, we will deploy our models to the cloud using both Amazon Web Services (AWS), and Google Cloud Platform (GCP). In addition to deploying our models, we will also explore a number of useful tools and resources such as Sagemaker, EC2, and AutoPilot (AWS), and Notebooks, App Engine, and AutoML (GCP).