Implementing a Spark ML clustering model
In this section, we will explain with Spark ML. We will a publicly available Dataset about the student's knowledge status about a subject.
Note
The Dataset is available for download from the UCI website at https://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling.
The attributes of the records contained in the Dataset have reproduced here from the UCI website mentioned previously for reference:
- STG: The degree of study time for goal object materials (input value)
- SCG: The degree of repetition number of users for goal object materials (input value)
- STR: The degree of study time of users for related objects with the goal object (input value)
- LPR: The exam performance of a user for related objects with the goal object (input value)
- PEG: The exam performance of a user for goal objects (input value)
- UNS: The knowledge level of the user (target value)
First, we will write a UDF to create two levels representing the two categories of the students--beneath...