Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Practical Data Science Cookbook, Second Edition Data pre-processing, analysis and visualization using R and Python

Product type Paperback

Published in Jun 2017

Publisher Packt

ISBN-13 9781787129627

Length 434 pages

Edition 2nd Edition

Languages

Python

Concepts

Data Analysis

Authors (5):

Anthony Ojeda

Prabhanjan Narayanachar Tattar

ABHIJIT DASGUPTA

Sean P Murphy

Bhushan Purushottam Joshi

+1 more

View More author details

Table of Contents (12) Chapters

Preface

1. Preparing Your Data Science Environment FREE CHAPTER

2. Driving Visual Analysis with Automobile Data with R

3. Creating Application-Oriented Analyses Using Tax Data and Python

4. Modeling Stock Market Data

5. Visually Exploring Employment Data

6. Driving Visual Analyses with Automobile Data

7. Working with Social Graphs

8. Recommending Movies at Scale (Python)

9. Harvesting and Geolocating Twitter Data (Python)

10. Forecasting New Zealand Overseas Visitors

11. German Credit Data Analysis

Introduction

The first project we will introduce in this book is an analysis of automobile fuel economy data. The primary tool that we will use to analyze this dataset is the R statistical programming language. R is often referred to as the lingua franca of data science since it is currently the most popular language for statistics and data analysis. As you'll see from the examples in this book, R is an excellent tool for data manipulation, analysis, modeling, visualization, and creating useful scripts to get analytical tasks done.

The recipes in this chapter will roughly follow these five steps in the data science pipeline:

Acquisition
Exploration and understanding
Munging, wrangling, and manipulation
Analysis and modeling
Communication and operationalization

Process-wise, the backbone of data science is the data science pipeline, and in order to get good at data science...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (5)

Tattar

Prabhanjan Narayanachar Tattar is a lead statistician and manager at the Global Data Insights & Analytics division of Ford Motor Company, Chennai. He received the IBS(IR)-GK Shukla Young Biometrician Award (2005) and Dr. U.S. Nair Award for Young Statistician (2007). He held SRF of CSIR-UGC during his PhD. He has authored books such as Statistical Application Development with R and Python, 2nd Edition, Packt; Practical Data Science Cookbook, 2nd Edition, Packt; and A Course in Statistics with R, Wiley. He has created many R packages.

See other products by Tattar

Bhushan Purushottam Joshi

Bhushan Purushottam Joshi is a teacher of computer science and has around 11 years of experience in teaching. He started his career as a programmer in a software firm but found true joy in teaching. He is a teacher by choice and not by chance. He teaches computer science courses such as MCA, MSc IT, BSc IT, and BSc CS at various colleges in Mumbai. He is a master at presenting technical as well as conceptual subjects in the most simplified manner. He has exemplary skills in relating daily life examples to technical concepts, which facilitates understanding of the subject matter. He enjoys teaching technical as well as conceptual subjects such as web design, Java, C#, C++, operating systems, computer networks, data structures, and ethical hacking. He is quite popular and appreciated among his students for his able guidance in their project work

See other products by Bhushan Purushottam Joshi

Sean P Murphy

Sean Patrick Murphy spent 15 years as a senior scientist at The Johns Hopkins University, Applied Physics Laboratory, where he focused on machine learning, modeling and simulation, signal processing, and high performance computing in the Cloud. Now, he acts as an advisor and data consultant for companies in San Francisco, New York, and Washington DC. He completed graduation from The Johns Hopkins University and got his MBA from the University of Oxford. He currently co-organizes the Data Innovation DC meetup and co-founded the Data Science MD meetup. He is also a board member and cofounder of Data Community DC.

See other products by Sean P Murphy

ABHIJIT DASGUPTA

<p>Abhijit Dasgupta is a data consultant working in the greater DC-Maryland-Virginia area, with several years of experience in biomedical consulting, business analytics, bioinformatics, and bioengineering consulting. He has a PhD in biostatistics from the University of Washington and over 40 collaborative peer-reviewed manuscripts, with strong interests in bridging the statistics/machine-learning divide. He is always on the lookout for interesting and challenging projects, and is an enthusiastic speaker and discussant on new and better ways to look at and analyze data. He is a member of Data Community DC and a founding member and co-organizer of Statistical Programming DC (formerly R Users DC)</p>

See other products by ABHIJIT DASGUPTA

Anthony Ojeda

Tony Ojeda is an accomplished data scientist and entrepreneur, with expertise in business process optimization and over a decade of experience creating and implementing innovative data products and solutions. He has a master's degree in finance from Florida International University and an MBA with a focus on strategy and entrepreneurship from DePaul University. He is the founder of District Data Labs, is a cofounder of Data Community DC, and is actively involved in promoting data science education through both organizations.

See other products by Anthony Ojeda