Packt+ | Advance your knowledge in tech

You're reading from Hands-On Exploratory Data Analysis with Python Perform EDA techniques to understand, summarize, and investigate your data

Product type Paperback

Published in Mar 2020

Publisher Packt

ISBN-13 9781789537253

Length 352 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Analysis

Authors (2):

Suresh Kumar Mukhiya

Usman Ahmed

View More author details

Having understood what EDA is, and its significance, let's understand the various steps involved in data analysis. Basically, it involves four different steps. Let's go through each of them to get a brief understanding of each step:

Problem definition: Before trying to extract useful insight from the data, it is essential to define the business problem to be solved. The problem definition works as the driving force for a data analysis plan execution. The main tasks involved in problem definition are defining the main objective of the analysis, defining the main deliverables, outlining the main roles and responsibilities, obtaining the current status of the data, defining the timetable, and performing cost/benefit analysis. Based on such a problem definition, an execution plan can be created.
Data preparation: This step involves methods for preparing the dataset before actual analysis. In this step, we define the sources of data, define data schemas and tables, understand the main characteristics of the data, clean the dataset, delete non-relevant datasets, transform the data, and divide the data into required chunks for analysis.
Data analysis: This is one of the most crucial steps that deals with descriptive statistics and analysis of the data. The main tasks involve summarizing the data, finding the hidden correlation and relationships among the data, developing predictive models, evaluating the models, and calculating the accuracies. Some of the techniques used for data summarization are summary tables, graphs, descriptive statistics, inferential statistics, correlation statistics, searching, grouping, and mathematical models.
Development and representation of the results: This step involves presenting the dataset to the target audience in the form of graphs, summary tables, maps, and diagrams. This is also an essential step as the result analyzed from the dataset should be interpretable by the business stakeholders, which is one of the major goals of EDA. Most of the graphical analysis techniques include scattering plots, character plots, histograms, box plots, residual plots, mean plots, and others. We will explore several types of graphical representation in Chapter 2, Visual Aids for EDA.

You're reading from Hands-On Exploratory Data Analysis with Python Perform EDA techniques to understand, summarize, and investigate your data

Table of Contents (17) Chapters

The significance of EDA

Steps in EDA

Authors (2)

Other recommended products

Personalised recommendations for you