Before getting down to the implementation of a logistic regression pipeline, refer back to the earlier table in section Breast cancer dataset at a glance where nine breast cancer tissue sample characteristics (features) are listed, along with one class column. To recap, those characteristics or features are listed as follows for context:
- clump_thickness
- size_uniformity
- shape_uniformity
- marginal_adhesion
- epithelial_size
- bare_nucleoli
- bland_chromatin
- normal_nucleoli
- mitoses
Now, let's get down to a high-level formulation of the logistic regression approach in terms of what it is meant to achieve. The following diagram represents the elements of such a formulation at a high level:
Breast cancer classification formulation
The preceding diagram represents a high-level formulation of a logistic classifier pipeline that we are aware...