Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Practical Data Science Cookbook, Second Edition Data pre-processing, analysis and visualization using R and Python

Product type Paperback

Published in Jun 2017

Publisher Packt

ISBN-13 9781787129627

Length 434 pages

Edition 2nd Edition

Languages

Python

Concepts

Data Analysis

Authors (5):

Anthony Ojeda

Prabhanjan Narayanachar Tattar

ABHIJIT DASGUPTA

Sean P Murphy

Bhushan Purushottam Joshi

+1 more

View More author details

Table of Contents (12) Chapters

Preface

1. Preparing Your Data Science Environment FREE CHAPTER

2. Driving Visual Analysis with Automobile Data with R

3. Creating Application-Oriented Analyses Using Tax Data and Python

4. Modeling Stock Market Data

5. Visually Exploring Employment Data

6. Driving Visual Analyses with Automobile Data

7. Working with Social Graphs

8. Recommending Movies at Scale (Python)

9. Harvesting and Geolocating Twitter Data (Python)

10. Forecasting New Zealand Overseas Visitors

11. German Credit Data Analysis

Dividing the data and the ROC

If one uses the entire dataset to build a model, it is possible that we might have over trained the model. A consequence is that the true performance of the model goes unnoticed for the unknown cases. Essentially, we need to build a good model for the credit problem and if the performance is unknown for the new or unforeseen cases, skepticism is bound to creep into our minds. A good practice then is to divide the available in three regions: (i) data for building the model, (ii) data to validate the model, and (iii) data to test the model. Thus, a set of models is built for a problem and then they are evaluated over the validated part of the data, and the model that does best at this stage is chosen for the test portion of the data. Data partitioning in three regions can be easily performed and we can quickly show how it is done on the German credit...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (5)

Purushottam Joshi

Bhushan Purushottam Joshi is a teacher of computer science and has around 11 years of experience in teaching. He started his career as a programmer in a software firm but found true joy in teaching. He is a teacher by choice and not by chance. He teaches computer science courses such as MCA, MSc IT, BSc IT, and BSc CS at various colleges in Mumbai. He is a master at presenting technical as well as conceptual subjects in the most simplified manner. He has exemplary skills in relating daily life examples to technical concepts, which facilitates understanding of the subject matter. He enjoys teaching technical as well as conceptual subjects such as web design, Java, C#, C++, operating systems, computer networks, data structures, and ethical hacking. He is quite popular and appreciated among his students for his able guidance in their project work

See other products by Purushottam Joshi

Tattar

Prabhanjan Narayanachar Tattar is a lead statistician and manager at the Global Data Insights & Analytics division of Ford Motor Company, Chennai. He received the IBS(IR)-GK Shukla Young Biometrician Award (2005) and Dr. U.S. Nair Award for Young Statistician (2007). He held SRF of CSIR-UGC during his PhD. He has authored books such as Statistical Application Development with R and Python, 2nd Edition, Packt; Practical Data Science Cookbook, 2nd Edition, Packt; and A Course in Statistics with R, Wiley. He has created many R packages.

See other products by Tattar

Anthony Ojeda

Tony Ojeda is an accomplished data scientist and entrepreneur, with expertise in business process optimization and over a decade of experience creating and implementing innovative data products and solutions. He has a master's degree in finance from Florida International University and an MBA with a focus on strategy and entrepreneurship from DePaul University. He is the founder of District Data Labs, is a cofounder of Data Community DC, and is actively involved in promoting data science education through both organizations.

See other products by Anthony Ojeda

ABHIJIT DASGUPTA

<p>Abhijit Dasgupta is a data consultant working in the greater DC-Maryland-Virginia area, with several years of experience in biomedical consulting, business analytics, bioinformatics, and bioengineering consulting. He has a PhD in biostatistics from the University of Washington and over 40 collaborative peer-reviewed manuscripts, with strong interests in bridging the statistics/machine-learning divide. He is always on the lookout for interesting and challenging projects, and is an enthusiastic speaker and discussant on new and better ways to look at and analyze data. He is a member of Data Community DC and a founding member and co-organizer of Statistical Programming DC (formerly R Users DC)</p>

See other products by ABHIJIT DASGUPTA

Sean P Murphy

See other products by Sean P Murphy