Inferring column types
To understand the dataset and move any further, we need to first understand what type of data we have. As our data is stored in columns, we should know their type before performing any operations. This is also called creating a data dictionary:
julia> typeof(iris_dataframe[1,:SepalLength])
Float64
julia> typeof(iris_dataframe[1,:Species])
ASCIIString
We have used the classic dataset of iris here. We already know the type of the data in these columns. We can apply the same function to any similar dataset. Suppose we were only given columns without labels; then it would have been hard to determine the type of data these columns contain. Sometimes, the dataset looks as if it contains numeric digits but their data type is ASCIIString
. These can lead to errors in further steps. These errors are avoidable.