Summary Statistics and Central Values
In order to find out what our data really looks like, we use a technique known as data profiling. This is defined as the process of examining the data available from an existing information source (for example, a database or a file) and collecting statistics or informative summaries about that data. The goal is to make sure that you understand your data well and are able to identify any challenges that the data may pose early on in the project, which is done by summarizing the dataset and assessing its structure, content, and quality.
Data profiling includes collecting descriptive statistics and data types. Here are a few commands that are commonly used to get a summary of a dataset:
data.info(): This command tells us how many non-null values there are there in each column, along with the data type of the values (non-numeric types are represented as object types).
data.describe(): This gives us basic summary statistics for all the numerical columns in the...