Business case
For this chapter, we will stick to cancer--prostate cancer in this case. It is a small dataset of 97 observations and nine variables but allows you to fully grasp what is going on with regularization techniques by allowing a comparison with traditional techniques. We will start by performing best subsets regression to identify the features and use this as a baseline for our comparison.
Business understanding
The Stanford University Medical Center has provided preoperative Prostate Specific Antigen (PSA) data on 97 patients who are about to undergo radical prostatectomy (complete prostate removal) for the treatment of prostate cancer. The American Cancer Society (ACS) estimates that nearly 30,000 American men died of prostate cancer in 2014 (http://www.cancer.org/). PSA is a protein that is produced by the prostate gland and is found in the bloodstream. The goal is to develop a predictive model of PSA among the provided set of clinical measures. PSA can be an effective prognostic...