The data for this use case has five classes, pertaining to no diabetic retinopathy, mild diabetic retinopathy, moderate diabetic retinopathy, severe diabetic retinopathy, and proliferative diabetic retinopathy. Hence, we can treat this as a categorical classification problem. For our categorical classification problem, the output labels need to be one-hot encoded, as shown here:
- No diabetic retinopathy: [1 0 0 0 0]T
- Mild diabetic retinopathy: [0 1 0 0 0]T
- Moderate diabetic retinopathy: [0 0 1 0 0]T
- Severe diabetic retinopathy: [0 0 0 1 0]T
- Proliferative diabetic retinopathy: [0 0 0 0 1]T
Softmax would be the best activation function for presenting the probability of the different classes in the output layer, while the sum of the categorical cross-entropy loss of each of the data points would be the best loss to optimize. For a single data point...