Python for Data Science
Python offers an incredible number of packages for data science. A package is a collection of prebuilt functions and classes shared publicly by its author(s). These packages extend the core functionalities of Python. The Python Package Index (https://packt.live/37iTRXc) lists all the packages available in Python.
In this section, we will present to you two of the most popular ones: pandas
and scikit-learn
.
The pandas Package
The pandas package provides an incredible amount of APIs for manipulating data structures. The two main data structures defined in the pandas
package are DataFrame
and Series
.
DataFrame and Series
A DataFrame
is a tabular data structure that is represented as a two-dimensional table. It is composed of rows, columns, indexes, and cells. It is very similar to a sheet in Excel or a table in a database:
In Figure 1.28, there are three different columns: algorithm
...