Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Practical Data Science with Python
Practical Data Science with Python

Practical Data Science with Python: Learn tools and techniques from hands-on examples to extract insights from data

eBook
€28.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Practical Data Science with Python

Introduction to Data Science

Data science is a thriving and rapidly expanding field, as you probably already know. People are starting to come to a consensus that everyone should have some basic data science skills, sometimes called "data literacy." This book is intended to get you up to speed with the basics of data science using the most popular programming language for doing data science today: Python. In this first chapter, we will cover:

  • The history of data science
  • The top tools and skills used in data science, and why these are used
  • Specializations within and related to data science
  • Best practices for managing a data science project

Data science is used in a variety of ways. Some data scientists focus on the analytics side of things, pulling out hidden patterns and insights from data, then communicating these results with visualizations and statistics. Others work on creating predictive models in order to predict future events, such as predicting whether someone will put solar panels on their house. Yet others work on models for classification; for example, classifying the make and model of a car in an image. One thing ties all applications of data science together: the data. Anywhere you have enough data, you can use data science to accomplish things that seem like magic to the casual observer.

The data science origin story

There's a saying in the data science community that's been around for a while, and it goes: "A data scientist is better than any computer scientist at statistics, and better than any statistician at computer programming." This encapsulates the general skills of most data scientists, as well as the history of the field.

Data science combines computer programming with statistics, and some even call data science applied statistics. Conversely, some statisticians think data science is only statistics. So, while we might say data science dates back to the roots of statistics in the 19th century, the roots of modern data science actually begin around the year 2000. At this time, the internet was beginning to bloom, and with it, the advent of big data. The amount of data generated from the web resulted in the new field of data science being born.

A brief timeline of key historical data science events is as follows:

  • 1962: John Tukey writes The Future of Data Analysis, where he envisions a new field for learning insights from data
  • 1977: Tukey publishes the book Exploratory Data Analysis, which is a key part of data science today
  • 1991: Guido Van Rossum publishes the Python programming language online for the first time, which goes on to become the top data science language used at the time of writing
  • 1993: The R programming language is publicly released, which goes on to become the second most-used data science general-purpose language
  • 1996: The International Federation of Classification Societies holds a conference titled "Data Science, Classification and Related Methods" – possibly the first time "data science" was used to refer to something similar to modern data science
  • 1997: Jeff Wu proposes renaming statistics "data science" in an inauguration lecture at the University of Michigan
  • 2001: William Cleveland publishes a paper describing a new field, "data science," which expands on data analysis
  • 2008: Jeff Hammerbacher and DJ Patil use the term "data scientist" in job postings after trying to come up with a good job title for their work
  • 2010: Kaggle.com launches as an online data science community and data science competition website
  • 2010s: Universities begin offering masters and bachelor's degrees in data science; data science job postings explode to new heights year after year; big breakthroughs are made in deep learning; the number of data science software libraries and publications burgeons.
  • 2012: Harvard Business Review publishes the notorious article entitled Data Scientist: The Sexiest Job of the 21st Century, which adds fuel to the data science fire.
  • 2015: DJ Patil becomes the chief data scientist of the US for two years.
  • 2015: TensorFlow (a deep learning and machine learning library) is released.
  • 2018: Google releases cloud AutoML, democratizing a new automatic technique for machine learning and data science.
  • 2020: Amazon SageMaker Studio is released, which is a cloud tool for building, training, deploying, and analyzing machine learning models.

We can make a few observations from this timeline. For one, the idea of data science was around for several decades before it became wildly popular. People foresaw that future society would need something like data science, but it wasn't until the amount of digital data became so widespread and easily accessible that data science could actually be used productively. We also note that the two most widely used programming languages in data science, Python and R, existed for 15 years before the field of data science existed in earnest, after which they rapidly took off in use as data science languages.

There is another trend happening in data science, which is the rise of data science competitions. The first online data science competition organization was Kaggle.com in 2010. Since then, they have been acquired by Google and continue to grow. Kaggle offers cash prizes for machine learning competitions (often 10k USD or more), and also has a large community of data science practitioners and learners. Several other websites have appeared and run data science competitions, often with cash prizes as well. Looking at other people's code (especially the winners' code if available) can be a good way to learn new data science techniques and tricks. Here are most of the current websites with data science competitions:

  • Kaggle
  • Analytics Vidhya
  • HackerRank
  • DrivenData (focused on social justice)
  • AIcrowd
  • CodaLab
  • Topcoder
  • Zindi
  • Tianchi
  • Several other specialized competitions, like Microsoft's COCO

A couple of websites that list data science competitions are:

ods.ai

www.mlcontests.com

Shortly after Kaggle was launched in 2010, universities started offering master's and then bachelor's degrees in data science. At the same time, a plethora of online resources and books have been released, teaching data science in a variety of ways.

As we can see, in the late 2010s and early 2020s, some aspects of data science started to become automated. This scares people who think data science might become fully automated soon. While some aspects of data science can be automated, it is still necessary to have someone with the data science know-how in order to properly use automated data science systems. It's also useful to have the skills to do data science from scratch by writing code, which offers ultimate flexibility. A data scientist is also still needed for a data science project in order to understand business requirements, implement data science products in production, and communicate the results of data science work to others.

Automated data science tools include automatic machine learning (AutoML) through Google Cloud, Amazon's AWS, Azure, H2O, and more. With AutoML, we can screen several machine learning models quickly in order to optimize predictive performance. Automated data cleaning is also being developed. At the same time that this automation is happening, we are also seeing a desire by companies to build "data literacy" among their employees. This "data literacy" means understanding some basic statistics and data science techniques, such as utilizing modern digital data and tools to benefit the organization by converting data into information. Practically speaking, this means we can take data from an Excel spreadsheet or database and create statistical visualizations and machine learning models to extract meaning from the data. In more advanced cases, this can mean creating predictive machine learning models that are used to guide decision making or can be sold to customers.

As we move into the future with data science, we will likely see an expansion of the toolsets available and automation of mundane work. We also anticipate organizations will increasingly expect their employees to have "data literacy" skills, including basic data science knowledge and techniques.

This should help organizations make better data-driven decisions, improve their bottom lines, and be able to utilize their data more effectively.

If you're interested in reading further on the history, composition, and others' thoughts of data science, David Donoho's paper 50 Years of Data Science is a great resource. The paper can be found here:

http://courses.csail.mit.edu/18.337/2016/docs/50YearsDataScience.pdf

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand and utilize data science tools in Python, such as specialized machine learning algorithms and statistical modeling
  • Build a strong data science foundation with the best data science tools available in Python
  • Add value to yourself, your organization, and society by extracting actionable insights from raw data

Description

Practical Data Science with Python teaches you core data science concepts, with real-world and realistic examples, and strengthens your grip on the basic as well as advanced principles of data preparation and storage, statistics, probability theory, machine learning, and Python programming, helping you build a solid foundation to gain proficiency in data science. The book starts with an overview of basic Python skills and then introduces foundational data science techniques, followed by a thorough explanation of the Python code needed to execute the techniques. You'll understand the code by working through the examples. The code has been broken down into small chunks (a few lines or a function at a time) to enable thorough discussion. As you progress, you will learn how to perform data analysis while exploring the functionalities of key data science Python packages, including pandas, SciPy, and scikit-learn. Finally, the book covers ethics and privacy concerns in data science and suggests resources for improving data science skills, as well as ways to stay up to date on new data science developments. By the end of the book, you should be able to comfortably use Python for basic data science projects and should have the skills to execute the data science process on any data source.

Who is this book for?

The book is intended for beginners, including students starting or about to start a data science, analytics, or related program (e.g. Bachelor’s, Master’s, bootcamp, online courses), recent college graduates who want to learn new skills to set them apart in the job market, professionals who want to learn hands-on data science techniques in Python, and those who want to shift their career to data science. The book requires basic familiarity with Python. A "getting started with Python" section has been included to get complete novices up to speed.

What you will learn

  • Use Python data science packages effectively
  • Clean and prepare data for data science work, including feature engineering and feature selection
  • Data modeling, including classic statistical models (such as t-tests), and essential machine learning algorithms, such as random forests and boosted models
  • Evaluate model performance
  • Compare and understand different machine learning methods
  • Interact with Excel spreadsheets through Python
  • Create automated data science reports through Python
  • Get to grips with text analytics techniques

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 30, 2021
Length: 620 pages
Edition : 1st
Language : English
ISBN-13 : 9781801071970
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Sep 30, 2021
Length: 620 pages
Edition : 1st
Language : English
ISBN-13 : 9781801071970
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 113.97
Practical Data Science with Python
€41.99
Machine Learning for Time-Series with Python
€41.99
Data Science Projects with Python
€29.99
Total 113.97 Stars icon

Table of Contents

29 Chapters
Part I - An Introduction and the Basics Chevron down icon Chevron up icon
Introduction to Data Science Chevron down icon Chevron up icon
Getting Started with Python Chevron down icon Chevron up icon
Part II - Dealing with Data Chevron down icon Chevron up icon
SQL and Built-in File Handling Modules in Python Chevron down icon Chevron up icon
Loading and Wrangling Data with Pandas and NumPy Chevron down icon Chevron up icon
Exploratory Data Analysis and Visualization Chevron down icon Chevron up icon
Data Wrangling Documents and Spreadsheets Chevron down icon Chevron up icon
Web Scraping Chevron down icon Chevron up icon
Part III - Statistics for Data Science Chevron down icon Chevron up icon
Probability, Distributions, and Sampling Chevron down icon Chevron up icon
Statistical Testing for Data Science Chevron down icon Chevron up icon
Part IV - Machine Learning Chevron down icon Chevron up icon
Preparing Data for Machine Learning: Feature Selection, Feature Engineering, and Dimensionality Reduction Chevron down icon Chevron up icon
Machine Learning for Classification Chevron down icon Chevron up icon
Evaluating Machine Learning Classification Models and Sampling for Classification Chevron down icon Chevron up icon
Machine Learning with Regression Chevron down icon Chevron up icon
Optimizing Models and Using AutoML Chevron down icon Chevron up icon
Tree-Based Machine Learning Models Chevron down icon Chevron up icon
Support Vector Machine (SVM) Machine Learning Models Chevron down icon Chevron up icon
Part V - Text Analysis and Reporting Chevron down icon Chevron up icon
Clustering with Machine Learning Chevron down icon Chevron up icon
Working with Text Chevron down icon Chevron up icon
Part VI - Wrapping Up Chevron down icon Chevron up icon
Data Storytelling and Automated Reporting/Dashboarding Chevron down icon Chevron up icon
Ethics and Privacy Chevron down icon Chevron up icon
Staying Up to Date and the Future of Data Science Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(19 Ratings)
5 star 78.9%
4 star 21.1%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Sven Einsiedler Nov 30, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Tl;dr: This book is a great choice for everyone with basic statistics & programming knowledge, who want to expand their Python skills and knowledge in the Data Science field. If you are looking for a book that covers a wide array of different Data Science topics & practical applications, this is the right book for you.What I liked most about "Practical Data Science with Python" is that it provided me with the perfect mixture of theory (basic statistical concepts, Machine learning models, etc.) and practical applications that went beyond just Python basics (Git, Web Scraping, handling SQL within Python, etc.). As a recent graduate now working for a tech company, I was able to refresh my knowledge from school, while at the same time picking up a new programming language. I think this book provides you with a really good tool kit if you are interested in working in tech or any data-driven company for that matter.The book is well-structured and easy to follow along. Each chapter starts with an introduction, outlining the topics that will be covered, and ends with a "Test your knowledge" section and summary. The questions are a fun way to keep track of your learnings and to double-check that you are indeed following along. The book further makes use of a lot of illustrative figures and code examples, which prevents unnecessarily long text parts that might be hard to follow.I would recommend this book for everyone with some initial programming knowledge (not necessarily from Python) looking for an "all rounder" Data Science introduction book.
Amazon Verified review Amazon
ScouserBass Dec 10, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As a beginner with Python but a good knowledge of stats I’ve been working through the book methodically and applying its lessons to my own project and dataset. The inclusion of Jupyter Notebook resources for book has been very enlightening. My Python competence has come on leaps and bounds. Thoroughly recommended for budding Data Scientists.
Amazon Verified review Amazon
Earle Feb 10, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
**Full disclosure, I was a student of Dr. George and purchased the book directly from Packt.**I highly recommend the book for newcomers to data science or those that have a general understanding of data science but want to solidify concepts and advance their knowledge. The book provides a nice balance of providing the reader with enough detail on each topic without overloading the reader using material that is too complex. Often books in the data science domain are too basic and do not cover more advanced subject matter or the author assumes prior knowledge leaving the reader feeling lost and confused. Dr. George navigates these challenges well and starts the reader out with an introduction on getting started with Python to eventually walking the reader through supervised learning techniques such as boosted trees and SVM, as well as unsupervised learning techniques such as K-means clustering. I found the sections on feature selection/engineering and optimizing models very comprehensive, including a wide range of techniques and options. A nice addition to the book was the inclusion of AutoML and PyCaret. That is an area that interests me, and I have yet to explore. I rarely write reviews on books, but due to the vast information nicely presented in an easy to understand and follow format, I felt compelled to endorse this book. I hope you enjoy the book as much as I do.
Amazon Verified review Amazon
vishal kaushik Oct 21, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Hello everyone,I am honored and glad at the same time that I got to read this book as I am data science enthusiast myself and beginning to set my foot in this domain.This book covers good length and breadth of the subject matter. It starts with very basic like how to install python and related packages and libraries along with version control using git( not all the books do that) .Then the books covers basics of data analysis using various libraries and tools and preparing data for machine learning models. Then the author dives into various machine learning algorithms with great and easy to understand examples.This book is definitely the best i have read in this subject domain and I highly recommend to everyone who is eager to jump into data science.It is definitely a great addition to my resource for learning data science.
Amazon Verified review Amazon
Danial Jan 03, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have reviewed first 3 chapters and gave an overview to bunch other chapters. The initial setup of environments is explained in a very easy way. I am amazed by the way that lines of code are distinguishable from other text which helped me go through code when exploring for specific issues and fix them in a short time.Major thing that can be improved is the use of more visualizations in each chapter. Besides that I am very satisfied with the book.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.