Dealing with messy data
Messy data can occur for a wide variety of reasons. For example, there are various forms of missing data, such as N/A, NA, None, Null, or any arbitrary number (in other words, -1, 999, 10,000, and more). It is important for analysts to understand the business meaning of the dataset they are handling during the data preparation process. By knowing the nature of missing values, the way that missing values are shown, and the data collection procedures that have triggered the occurrence of missing values, they can choose the best way to interpret this type of data.
Working on data without column headers
Often, the column headers in your data hold the preliminary information and business meaning. However, there is a chance that the column headers will be absent. This results in no specific information that can be derived to help understand the relationship between the headers and the content of the data.
Let's start with the example that we previously...