The causes of outliers
Before considering any action to be taken on the outliers of a variable, it is necessary to consider what may have caused them. Once the cause is identified, it may be possible to correct the outliers immediately. Here is a possible categorization of the causes of outliers:
- Data entry errors: There may be an analyst collecting the data who made a mistake in compiling the data. For example, if the analyst is collecting the birth dates of a group of people, the analyst may write 177 instead of 1977. If the dates collected are in the 1900-2100 range, it is easy to correct the outlier created by the data entry error. Other times, it is not possible to recover the correct value.
- Intentional outliers: Very often, the introduction of “errors” is intentional on the part of the individuals to whom the measurements apply. For example, adolescents typically do not accurately report the amount of alcohol they consume.
- Data processing...