Exploratory Data Analysis
The majority of time in a data science project is spent on Exploratory Data Analysis (EDA). In EDA, we investigate data to find hidden patterns and outliers with the help of visualization. By performing EDA, we can uncover the underlying structure of data and test our hypotheses with the help of summary statistics. We can split EDA into three parts:
- Univariate analysis
- Bivariate analysis
- Correlation
Let's look at each of the parts one by one in the following sections.
Univariate Analysis
Univariate analysis is the simplest form of analysis where we analyze each feature (that is, each column of a DataFrame) and try to uncover the pattern or distribution of the data.
In univariate analysis, we will be analyzing the categorical columns (DEFAULT
, SEX
, EDUCATION
, and MARRIAGE
) to mine useful information about the data:
Let's begin with each of the variables one by one:
- The
DEFAULT
column:Let's look at the...