Packt+ | Advance your knowledge in tech

You're reading from Data Wrangling with Python Creating actionable data from raw sources

Product type Paperback

Published in Feb 2019

Publisher Packt

ISBN-13 9781789800111

Length 452 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Analysis

Authors (2):

Dr. Tirthajyoti Sarkar

Shubhadeep Roychowdhury

View More author details

Table of Contents (12) Chapters

Data Wrangling with Python

Preface

1. Introduction to Data Wrangling with Python

2. Advanced Data Structures and File Handling FREE CHAPTER

3. Introduction to NumPy, Pandas, and Matplotlib

4. A Deep Dive into Data Wrangling with Python

5. Getting Comfortable with Different Kinds of Data Sources

6. Learning the Hidden Secrets of Data Wrangling

7. Advanced Web Scraping and Data Gathering

8. RDBMS and SQL

9. Application of Data Wrangling in Real Life

Appendix

Python for Data Wrangling

There is always a debate on whether to perform the wrangling process using an enterprise tool or by using a programming language and associated frameworks. There are many commercial, enterprise-level tools for data formatting and pre-processing that do not involve much coding on the part of the user. These examples include the following:

General purpose data analysis platforms such as Microsoft Excel (with add-ins)
Statistical discovery package such as JMP (from SAS)
Modeling platforms such as RapidMiner
Analytics platforms from niche players focusing on data wrangling, such as Trifacta, Paxata, and Alteryx

However, programming languages such as Python provide more flexibility, control, and power compared to these off-the-shelf tools.

As the volume, velocity, and variety (the three Vs of big data) of data undergo rapid changes, it is always a good idea to develop and nurture a significant amount of in-house expertise in data wrangling using fundamental programming frameworks so that an organization is not beholden to the whims and fancies of any enterprise platform for as basic a task as data wrangling:

Figure 1.2: Google trend worldwide over the last Five years

A few of the obvious advantages of using an open source, free programming paradigm such as Python for data wrangling are the following:

General purpose open source paradigm putting no restriction on any of the methods you can develop for the specific problem at hand
Great ecosystem of fast, optimized, open source libraries, focused on data analytics
Growing support to connect Python to every conceivable data source type
Easy interface to basic statistical testing and quick visualization libraries to check data quality
Seamless interface of the data wrangling output with advanced machine learning models

Python is the most popular language of choice of machine learning and artificial intelligence these days.

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Dr. Tirthajyoti Sarkar

Dr. Tirthajyoti Sarkar works as a senior principal engineer in the semiconductor technology domain, where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. He writes regularly about Python programming and data science topics. He holds a Ph.D. from the University of Illinois and certifications in artificial intelligence and machine learning from Stanford and MIT.

See other products by Dr. Tirthajyoti Sarkar

Shubhadeep Roychowdhury

Shubhadeep Roychowdhury holds a master's degree in computer science from West Bengal University of Technology and certifications in machine learning from Stanford. He works as a senior software engineer at a Paris-based cybersecurity startup, where he is applying state-of-the-art computer vision and data engineering algorithms and tools to develop cutting-edge products. He often writes about algorithm implementation in Python and similar topics.

See other products by Shubhadeep Roychowdhury