Pandas DataFrame
A Pandas DataFrame is a two‐dimensional NumPy‐like array. You can think of it as a table. Figure 3.2 shows the structure of a DataFrame in Pandas. It also shows you that an individual column in a DataFrame (together with the index) is a Series.
A DataFrame is very useful in the world of data science and machine learning, as it closely mirrors how data are stored in real‐life. Imagine the data stored in a spreadsheet, and you would have a very good visual impression of a DataFrame. A Pandas DataFrame is often used when representing data in machine learning. Hence, for the remaining sections in this chapter, we are going to invest significant time and effort in understanding how it works.
Creating a DataFrame
You can create a Pandas DataFrame using the DataFrame()
class:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10,4),
columns=list('ABCD'))
print(df)
In the preceding...