Here are a list of questions for your reference:
- What do you understand by EDA? Why is it important?
- Why do we create training and test data?
- Why did we index the data that we pulled from the UCI Machine Learning Repository?
- Why is the Iris dataset so famous?
- Name one powerful feature of the random forest classifier.
- What is supervisory as opposed to unsupervised learning?
- Explain briefly the process of creating our model with training data.
- What are feature variables in relation to the Iris dataset?
- What is the entry point to programming with Spark?
Task: The Iris dataset problem was a statistical classification problem. Create a confusion or error matrix with the rows being predicted setosa, predicted versicolor, and predicted virginica, and the columns being actual species, such as setosa, versicolor, and virginica. Having done that, interpret this matrix.