Stealing models with model extraction attacks
Model extraction attacks in AI involve replicating the functionality of machine learning models. The process aims to replicate the target models by observing and mimicking their output responses to various inputs. Model extraction attacks, using an iterative query-based approach, involve a process where the attacker repeatedly queries the target AI model with carefully selected inputs. The attacker receives output data from the model with each query, such as predictions or confidence scores. This data is then used to refine and train a new model iteratively – the extraction model – to replicate the target model’s decision-making process. Over time, as more queries are made, and more output data is collected, the extraction model becomes increasingly similar in functionality to the original target model, effectively capturing its behavior and decision-making patterns.
There are different approaches to staging this...