Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Python Data Science Essentials A practitioner's guide covering essential data science principles, tools, and techniques

Product type Paperback

Published in Sep 2018

Publisher Packt

ISBN-13 9781789537864

Length 472 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Data Science

Authors (2):

Luca Massaron

Alberto Boschetti

View More author details

Table of Contents (11) Chapters

Preface

1. First Steps FREE CHAPTER

2. Data Munging

3. The Data Pipeline

4. Machine Learning

5. Visualization, Insights, and Results

6. Social Network Analysis

7. Deep Learning Beyond the Basics

8. Spark for Big Data

9. Strengthen Your Python Foundations

10. Other Books You May Enjoy

Leave a review - let other readers know what you think

Cross-validation

If you have run the previous experiment, you may have realized that:

Both the validation and test results vary, as their samples are different.
The chosen hypothesis is often the best one, but this is not always the case.

Unfortunately, relying on the validation and testing phases of samples brings uncertainty, along with a reduction of the learning examples dedicated to training (the fewer the examples, the more the variance of the estimates from the model).

A solution would be to use cross-validation, and Scikit-learn offers a complete module for cross-validation and performance evaluation (sklearn.model_selection).

By resorting to cross-validation, you'll just need to separate your data into a training and test set, and you will be able to use the training data for both model optimization and model training.

How does cross-validation work? The idea is...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.

See other products by Alberto Boschetti

Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.

See other products by Luca Massaron