Handling missing values
One of the most common scenarios when handling data is to find missing values in your dataset.
Missing values are important to handle because, for example, many machine learning algorithms cannot have missing values if you want them to work properly. Or, if you are creating a report, you do not want to present stats with an aggregation of null values.
It's important to notice that Optimus treats None
and NaN
(Not a Number) values as interchangeable to indicate null values. To handle them, you can do two things: remove the data or impute it. In this section, we will present how Optimus can help with both tasks without providing an exhaustive statistical explanation of when to use each method. Let's see how Optimus can help us with both tasks.
Removing data
In this case, we will see how we can remove whole rows or columns that contain missing values.
Removing a row
First, let's create a dataframe with some null values in many...