Data selection
Depending on the objectives of the corresponding projects, we need to select the required data for model training and testing. For example, you might have access to information about cancer patients in one or multiple hospitals, such as their age, gender, whether they smoke or not, their genetic information if available, their MRI or CT scans if available, history of their medication, their response to cancer drugs, whether they had surgery or not, their prescriptions, either handwritten or in PDF format, and much more. When you want to build a machine learning model to predict the response of patients to therapy using their CT scans, you need to select different data for each patient compared to when you want to build a model using their information, such as age, gender, and smoking status. You also need to select patients from whom you have the input and output data available if you are building a supervised learning model.
Note
It is possible to combine data...