Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Python Data Science Essentials A practitioner's guide covering essential data science principles, tools, and techniques

Product type Paperback

Published in Sep 2018

Publisher Packt

ISBN-13 9781789537864

Length 472 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Data Science

Authors (2):

Luca Massaron

Alberto Boschetti

View More author details

Table of Contents (11) Chapters

Preface

1. First Steps FREE CHAPTER

2. Data Munging

3. The Data Pipeline

4. Machine Learning

5. Visualization, Insights, and Results

6. Social Network Analysis

7. Deep Learning Beyond the Basics

8. Spark for Big Data

9. Strengthen Your Python Foundations

10. Other Books You May Enjoy

Leave a review - let other readers know what you think

Summary

In this chapter, we have introduced you to the Hadoop ecosystem, including the architecture, HDFS, and PySpark. After this introduction, we started setting up your local Spark instance, and after sharing variables across cluster nodes, we went through data processing in Spark using both RDDs and DataFrames.

Later on in this chapter, we learned about machine learning with Spark, which included reading a dataset, training a learner, the power of the machine learning pipeline, cross-validation, and even testing what we learned with an example dataset.

This concludes our journey around the essentials in data science with Python, and the next chapter is just an appendix to refresh and strengthen your Python foundations. In conclusion, through all the chapters of this book, we have completed our tour of a data science project, touching on all the key steps of a project and presenting...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.

See other products by Alberto Boschetti

Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.

See other products by Luca Massaron