Understanding model inversion attacks
Model inversion attacks are sophisticated adversarial AI threats targeting ML models. Their name reflects their attempt to invert the model’s prediction and reconstruct training data or sensitive information from a trained working model.
These attacks can happen in both white and black box settings:
- White box model inversion attacks: In white box model inversion attacks, the attacker has complete access to the model, including its architecture, weights, and possibly even the training data. This level of access allows the attacker to exploit specific model details to reconstruct inputs or infer sensitive information. White box attacks can be very precise because the attacker can utilize the knowledge of the model’s internal workings to reverse engineer or deduce the data that was used during the training process.
- Black box model inversion attacks: In black box model inversion attacks, the attacker has no direct knowledge...