Subscription

Explore Products

Best Sellers

New Releases

Books

Events

Videos

Audiobooks

Packt Hub

Free Learning

You're reading from Python Data Science Essentials A practitioner's guide covering essential data science principles, tools, and techniques

Product type Paperback

Published in Sep 2018

Publisher Packt

ISBN-13 9781789537864

Length 472 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Data Science

Authors (4):

Alberto Boschetti

Luca Massaron

Pietro Marinelli

Matteo Malosetti

View More author details

Table of Contents (11) Chapters

Preface

1. First Steps

2. Data Munging FREE CHAPTER

3. The Data Pipeline

4. Machine Learning

5. Visualization, Insights, and Results

6. Social Network Analysis

7. Deep Learning Beyond the Basics

8. Spark for Big Data

9. Strengthen Your Python Foundations

10. Other Books You May Enjoy

Leave a review - let other readers know what you think

Data processing with NumPy

Having introduced the essential pandas commands to upload and preprocess your data in memory completely, in smaller batches, or even in single data rows, at this point of the data science pipeline, you'll have to work on it in order to prepare a suitable data matrix for your supervised and unsupervised learning procedures.

As a best practice, we advise that you divide the task between a phase of your work when your data is still heterogeneous (a mix of numerical and symbolic values) and another phase when it is turned into a numeric table of data. A table of data, or matrix, is arranged in rows that represent your examples, and columns that contain the characteristic observed values of your examples, which are your variables.

Following our advice, you have to wrangle between two key Python packages for scientific analysis, pandas and NumPy, and...

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You have been reading a chapter from

Python Data Science Essentials - Third Edition

Published in: Sep 2018

Publisher: Packt

ISBN-13: 9781789537864

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (4)

Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.

See other products by Alberto Boschetti

Luca Massaron

Luca Massaron is a data scientist with over a decade of experience in transforming data into high-impact, innovative artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is the author of numerous bestselling books on AI, machine learning, and algorithms. Luca is also a 3x Kaggle Grandmaster who reached number 7 in the worldwide user rankings for his performance in data science competitions. Additionally, he is recognized as a Google Developer Expert (GDE) in AI, Kaggle, and the cloud.

See other products by Luca Massaron

Pietro Marinelli

Pietro Marinelli has consistently been ranked among the top data scientists in the world in the Google Artificial Intelligence platform, Kaggle. He has reached 3rd position among Italian data scientists and 214th among 91,000 data scientists around the world. Due to his work on Kaggle, he has been honored to participate as a speaker in Paris Kaggle Day, January 2019. He has been working with artificial intelligence, text analytics, and many other data science techniques for many years, and has more than 10 years experience in designing products based on data for different industries. He has produced a variety of algorithms, ranging from predictive modeling to an advanced simulation algorithm to support senior management's business decisions for a variety of multinational companies. He is currently collaborating as a reviewer for Packt, reviewing AI books. NLP has been one of the core focuses of his projects. He has developed different algorithms for text understanding and classification in different languages (including English, Spanish, Italian, Japanese, German, French, Russian, and Chinese)

See other products by Pietro Marinelli

Matteo Malosetti

Matteo Malosetti is a mathematical engineer working as a data scientist in insurance. He is passionate about NLP applications and Bayesian statistics.

See other products by Matteo Malosetti