Data science
While pandas offers some built-in statistical algorithms, it cannot hope to cover all of the statistical and machine learning algorithms that are used in the domain of data science. Fortunately, however, many of the libraries that do specialize further in data science offer very tight integrations with pandas, letting you move data from one library to the next rather seamlessly.
scikit-learn
scikit-learn is a popular machine learning library that can help with both supervised and unsupervised learning. The scikit-learn library offers an impressive array of algorithms for classification, prediction, and clustering tasks, while also providing tools to pre-process and cleanse your data.
We cannot hope to cover all of these features, but for the sake of showcasing something, let’s once again load the vehicles dataset:
df = pd.read_csv(
"data/vehicles.csv.zip",
dtype_backend="numpy_nullable",
dtype={
"rangeA...