Questions
Factual
- How do you decide whether to use
kmeans
orkdemoids
? - What is the significance of the boxplot layout? Why does it look that way?
- Describe the underlying data produced in the outliers for the
iris
data, given the density plot. - What are the extract rules for other items in the market dataset?
When, how, and why?
- What is the risk of not vetting the outliers that are detected for the specific domain? Shouldn't the calculation always work?
- Why do we need to exclude the
iris
category column from the outlier detection algorithm? Can it be used in some way when determining outliers? - Can you come up with a scenario where the market basket data and rules we generated were not applicable to the store you are working with?
Challenges
- I found it difficult to develop test data for outliers in two dimensions that both occurred in the same instance using random data. Can you develop a test that would always have several outliers in at least two dimensions that occur in the same instance?
- There is a good dataset on the Internet regarding passenger data on the Titanic. Generate the rules regarding the possible survival of the passengers.