To perform machine learning with scikit-learn, we need some data to start with. We will load the iris dataset, one of the several datasets available in scikit-learn.
Loading the iris dataset
Getting ready
A scikit-learn program begins with several imports. Within Python, preferably in Jupyter Notebook, load the numpy, pandas, and pyplot libraries:
import numpy as np #Load the numpy library for fast array computations
import pandas as pd #Load the pandas data-analysis library
import matplotlib.pyplot as plt #Load the pyplot visualization library
If you are within a Jupyter Notebook, type the following to see a graphical output instantly:
%matplotlib inline
How to do it...
- From the scikit-learn datasets module, access the iris dataset:
from sklearn import datasets
iris = datasets.load_iris()
How it works...
Similarly, you could have imported the diabetes dataset as follows:
from sklearn import datasets #Import datasets module from scikit-learn
diabetes = datasets.load_diabetes()
There! You've loaded diabetes using the load_diabetes() function of the datasets module. To check which datasets are available, type:
datasets.load_*?
Once you try that, you might observe that there is a dataset named datasets.load_digits. To access it, type the load_digits() function, analogous to the other loading functions:
digits = datasets.load_digits()
To view information about the dataset, type digits.DESCR.