Packt+ | Advance your knowledge in tech

You're reading from Data Science with Python Combine Python with machine learning principles to discover hidden patterns in raw data

Product type Paperback

Published in Jul 2019

Publisher Packt

ISBN-13 9781838552862

Length 426 pages

Edition 1st Edition

Languages

Python

Tools

Combine

Concepts

Data Science

Authors (3):

Rohan Chopra

Aaron England

Mohamed Noordeen Alaudeen

View More author details

Table of Contents (10) Chapters

About the Book

1. Introduction to Data Science and Data Pre-Processing FREE CHAPTER

2. Data Visualization

3. Introduction to Machine Learning via Scikit-Learn

4. Dimensionality Reduction and Unsupervised Learning

5. Mastering Structured Data

6. Decoding Images

7. Processing Human Language

8. Tips and Tricks of the Trade

1. Appendix

Python Libraries

Throughout this book, we'll be using various Python libraries, including pandas, Matplotlib, Seaborn, and scikit-learn.

pandas

pandas is an open source package that has many functions for loading and processing data in order to prepare it for machine learning tasks. It also has tools that can be used to analyze and manipulate data. Data can be read from many formats using pandas. We will mainly be using CSV data throughout this book. To read CSV data, you can use the read_csv() function by passing filename.csv as an argument. An example of this is shown here:

>>> import pandas as pd

>>> pd.read_csv("data.csv")

In the preceding code, pd is an alias name given to pandas. It is not mandatory to give an alias. To visualize a pandas DataFrame, you can use the head() function to list the top five rows. This will be demonstrated in one of the following exercises.

Note

Please visit the following link to learn more about pandas: https://pandas.pydata.org/pandas-docs/stable/.

NumPy

NumPy is one of the main packages that Python has to offer. It is mainly used in practices related to scientific computing and when working on mathematical operations. It comprises of tools that enable us to work with arrays and array objects.

Matplotlib

Matplotlib is a data visualization package. It is useful for plotting data points in a 2D space with the help of NumPy.

Seaborn

Seaborn is also a data visualization library that is based on matplotlib. Visualizations created using Seaborn are far more attractive than ones created using matplotlib in terms of graphics.

scikit-learn

scikit-learn is a Python package used for machine learning. It is designed in such a way that it interoperates with other numeric and scientific libraries in Python to achieve the implementation of algorithms.

These ready-to-use libraries have gained interest and attention from developers, especially in the data science space. Now that we have covered the various libraries in Python, in the next section we'll explore the roadmap for building machine learning models.

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You have been reading a chapter from

Data Science with Python

Published in: Jul 2019

Publisher: Packt

ISBN-13: 9781838552862

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Rohan Chopra

Rohan Chopra graduated from Vellore Institute of Technology with a bachelors degree in computer science. Rohan has an experience of more than 2 years in designing, implementing, and optimizing end-to-end deep neural network systems. His research is centered around the use of deep learning to solve computer vision-related problems and has hands-on experience working on self-driving cars. He is a data scientist at Absolutdata.

See other products by Rohan Chopra

Aaron England

Aaron England earned a Ph.D from the University of Utah in Exercise and Sports Science with a cognate in Biostatistics. Currently, he resides in Scottsdale, Arizona where he works as a data scientist at Natural Partners Fullscript.

See other products by Aaron England

Mohamed Noordeen Alaudeen

Mohamed Noordeen Alaudeen is a lead data scientist at Logitech. Noordeen has 7+ years of experience in building and developing end-to-end BigData and Deep Neural Network Systems. It all started when he decided to engage the rest of his life for data science. He is Seasonal data science and big data trainer with both Imarticus Learning and Great Learning, which are two of the renowned data science institutes in India. Apart from his teaching, he does contribute his work to open-source. He has over 90+ repositories on GitHub, which have open-sourced his technical work and data science material. He is an active influencer( with over 22,000+ connections) on Linkedin, helping the data science community.

See other products by Mohamed Noordeen Alaudeen