What this book covers
Chapter 1, Generating Summary Statistics, explores statistical concepts, such as measures of central tendency and variability, which help with effectively summarizing and analyzing data. It provides practical examples and step-by-step instructions on how to use Python libraries, such as NumPy, Pandas and SciPy to compute measures (like the mean, median, mode, standard deviation, percentiles, and other critical summary statistics). By the end of the chapter, you will have gained the required knowledge for generating summary statistics in Python. You will also have gained the foundational knowledge required for understanding some of the more complex EDA techniques covered in other chapters.
Chapter 2, Preparing Data for EDA, focuses on the critical steps required to prepare data for analysis. Real-world data rarely come in a ready-made format, hence the reason for this very crucial step in EDA. Through practical examples, you will learn aggregation techniques such as grouping, concatenating, appending, and merging. You will also learn data-cleaning techniques, such as handling missing values, changing data formats, removing records, and replacing records. Lastly, you will learn how to transform data by sorting and categorizing it.
By the end of this chapter, you will have mastered the techniques in Python required for preparing data for EDA.
Chapter 3, Visualizing Data in Python, covers data visualization tools critical for uncovering hidden trends and patterns in data. It focuses on popular visualization libraries in Python, such as Matplotlib, Seaborn, GGPLOT and Bokeh, which are used to create compelling representations of data. It also provides the required foundation for subsequent chapters in which some of the libraries will be used. With practical examples and a step-by-step guide, you will learn how to plot charts and customize them to present data effectively. By the end of this chapter, you will be equipped with the knowledge and hands-on experience of Python’s visualization capabilities to uncover valuable insights.
Chapter 4, Performing Univariate Analysis in Python, focuses on essential techniques for analyzing and visualizing a single variable of interest to gain insights into its distribution and characteristics. Through practical examples, it delves into a wide range of visualizations such as histograms, boxplots, bar plots, summary tables, and pie charts required to understand the underlying distribution of a single variable and uncover hidden patterns in the variable. It also covers univariate analysis for both categorical and numerical variables.
By the end of this chapter, you will be equipped with the knowledge and skills required to perform comprehensive univariate analysis in Python to uncover insights.
Chapter 5, Performing Bivariate Analysis in Python, explores techniques for analyzing the relationships between two variables of interest and uncovering meaningful insights embedded in them. It delves into various techniques, such as correlation analysis, scatter plots, and box plots required to effectively understand relationships, trends, and patterns that exist between two variables. It also explores the various bivariate analysis options for different variable combinations, such as numerical-numerical, numerical-categorical, and categorical-categorical. By the end of this chapter, you will have gained the knowledge and hands-on experience required to perform in-depth bivariate analysis in Python to uncover meaningful insights.
Chapter 6, Performing Multivariate Analysis in Python, builds on previous chapters and delves into some more advanced techniques required to gain insights and identify complex patterns within multiple variables of interest. Through practical examples, it delves into concepts, such as clustering analysis, principal component analysis and factor analysis, which enable the understanding of interactions among multiple variables of interest. By the end of this chapter, you will have the skills required to apply advanced analysis techniques to uncover hidden patterns in multiple variables.
Chapter 7, Analyzing Time Series Data, offers a practical guide to analyze and visualize time series data. It introduces time series terminologies and techniques (such as trend analysis, decomposition, seasonality detection, differencing, and smoothing) and provides practical examples and code on how to implement them using various libraries in Python. It also covers how to spot patterns within time series data to uncover valuable insights. By the end of the chapter, you will be equipped with the relevant skills required to explore, analyze, and derive insights from time series data.
Chapter 8, Analyzing Text Data, covers techniques for analyzing text data, a form of unstructured data. It provides a comprehensive guide on how to effectively analyze and extract insights from text data. Through practical steps, it covers key concepts and techniques for data preprocessing such as stop-word removal, tokenization, stemming, and lemmatization. It also covers essential techniques for text analysis such as sentiment analysis, n-gram analysis, topic modelling, and part-of-speech tagging. At the end of this chapter, you will have the necessary skills required to process and analyze various forms of text data to unpack valuable insights.
Chapter 9, Dealing with Outliers and Missing Values, explores the process of effectively handling outliers and missing values within data. It highlights the importance of dealing with missing values and outliers and provides step-by-step instructions on how to handle them using visualization techniques and statistical methods in Python. It also delves into various strategies for handling missing values and outliers within different scenarios. At the end of the chapter, you will have the essential knowledge of the tools and techniques required to handle missing values and outliers in various scenarios.
Chapter 10, Performing Automated EDA, focuses on speeding up the EDA process through automation. It explores the popular automated EDA libraries in Python, such as Pandas Profiling, Dtale, SweetViz, and AutoViz. It also provides hands-on guidance on how to build custom functions to automate the EDA process yourself. With step-by-step instructions and practical examples, it will empower you to gain deep insights quickly from data and save time during the EDA process.