In some cases, the response variable that we are trying to predict may already be a separate well-defined column. In those cases, simply converting the response from a string to a numeric type before splitting the data into train and test sets will suffice.
In our specific modeling task, we are trying to predict which patients presenting to the ED will eventually be hospitalized. In our case, hospitalization encompasses:
- Those admitted to an inpatient ward for further evaluation and treatment
- Those transferred to a different hospital (either psychiatric or non-psychiatric) for further treatment
- Those admitted to the observation unit for further evaluation (whether they are eventually admitted or discharged after their observation unit stay)
Accordingly, we must do some data wrangling to assemble all of these various outcomes into a single response...