Packt+ | Advance your knowledge in tech

You're reading from Data Science with Python Combine Python with machine learning principles to discover hidden patterns in raw data

Product type Paperback

Published in Jul 2019

Publisher Packt

ISBN-13 9781838552862

Length 426 pages

Edition 1st Edition

Languages

Python

Tools

Combine

Concepts

Data Science

Authors (3):

Rohan Chopra

Aaron England

Mohamed Noordeen Alaudeen

View More author details

Table of Contents (10) Chapters

About the Book

1. Introduction to Data Science and Data Pre-Processing FREE CHAPTER

2. Data Visualization

3. Introduction to Machine Learning via Scikit-Learn

4. Dimensionality Reduction and Unsupervised Learning

5. Mastering Structured Data

6. Decoding Images

7. Processing Human Language

8. Tips and Tricks of the Trade

1. Appendix

Text Data Processing

Before we start building machine learning models for our textual data, we need to process the data. First, we will learn the different ways in which we can understand what the data comprises. This helps us get a sense of what the data really is and decide on the preprocessing techniques to be used in the next step. Next, we will move on to learn the techniques that will help us preprocess the data. This step helps reduce the size of the data, thus reducing the training time, and also helps us transform the data into a form that would be easier for machine learning algorithms to extract information from. Finally, we will learn how to convert the textual data to numbers so that machine learning algorithms can actually use it to create models. We do this using word embedding, much like the entity embedding we performed in Chapter 5: Mastering Structured Data.

Regular Expressions

Before we start working on textual data, we need to learn about regular expressions (RegEx)....

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Rohan Chopra

Rohan Chopra graduated from Vellore Institute of Technology with a bachelors degree in computer science. Rohan has an experience of more than 2 years in designing, implementing, and optimizing end-to-end deep neural network systems. His research is centered around the use of deep learning to solve computer vision-related problems and has hands-on experience working on self-driving cars. He is a data scientist at Absolutdata.

See other products by Rohan Chopra

Aaron England

Aaron England earned a Ph.D from the University of Utah in Exercise and Sports Science with a cognate in Biostatistics. Currently, he resides in Scottsdale, Arizona where he works as a data scientist at Natural Partners Fullscript.

See other products by Aaron England

Mohamed Noordeen Alaudeen

Mohamed Noordeen Alaudeen is a lead data scientist at Logitech. Noordeen has 7+ years of experience in building and developing end-to-end BigData and Deep Neural Network Systems. It all started when he decided to engage the rest of his life for data science. He is Seasonal data science and big data trainer with both Imarticus Learning and Great Learning, which are two of the renowned data science institutes in India. Apart from his teaching, he does contribute his work to open-source. He has over 90+ repositories on GitHub, which have open-sourced his technical work and data science material. He is an active influencer( with over 22,000+ connections) on Linkedin, helping the data science community.

See other products by Mohamed Noordeen Alaudeen