Visualizing and understanding your data in Python
In this recipe, we will load the sample dataset and generate a scatter plot to explore the relationship between the variables in the dataset. As you can see in the following screenshot, we have started with a DataFrame containing the management_experience_months
and monthly_salary
values and generated a visualization that allows us to observe the linear relationship between these two variables:
The objective of this recipe is for us to understand the data first using plotting libraries (for example, matplotlib
) before diving directly into the other steps of the ML process. We will start by loading a sample dataset from a CSV file to a pandas DataFrame
and then use matplotlib
to generate a scatter plot.
Getting ready
This recipe continues on from the Preparing the Amazon S3 bucket and the training dataset for the linear...