Exploring categorical data
The adjective categorical is applied to data that, in a broad sense, is used to classify and help navigate your data, but whose values serve little to no purpose when aggregated. For example, if you were working with a dataset that contained a field called eye color with values of Brown
, Green
, Hazel
, Blue
, etc., you could use this field to navigate your dataset and answer questions like, For rows where the eye color is X, what is the average pupil diameter? However, you would not ask a question like, What is the summation of eye color?, as a formula like "Hazel" + "Blue
would not make sense in this context.
By contrast, the adjective continuous is applied to data that you typically aggregate. With a question like, What is the average pupil diamenter?, the pupil diameter column would be considered continuous. There is value to knowing what it aggregates to (i.e., minimum, maximum, average, standard deviation, etc.), and there are a theoretically...