You're reading from Adversarial AI Attacks, Mitigations, and Defense Strategies A cybersecurity professional's guide to AI attacks, threat modeling, and securing AI with MLSecOps

Product type Paperback

Published in Jul 2024

Publisher Packt

ISBN-13 9781835087985

Length 586 pages

Edition 1st Edition

Languages

Python

Tools

DeepFakes

Concepts

Artificial Intelligence

Author (1):

John Sotiropoulos

View More author details

Table of Contents (27) Chapters

Preface

1. Part 1: Introduction to Adversarial AI FREE CHAPTER

2. Chapter 1: Getting Started with AI

3. Chapter 2: Building Our Adversarial Playground

4. Chapter 3: Security and Adversarial AI

5. Part 2: Model Development Attacks

6. Chapter 4: Poisoning Attacks

7. Chapter 5: Model Tampering with Trojan Horses and Model Reprogramming

8. Chapter 6: Supply Chain Attacks and Adversarial AI

9. Part 3: Attacks on Deployed AI

10. Chapter 7: Evasion Attacks against Deployed AI

11. Chapter 8: Privacy Attacks – Stealing Models

12. Chapter 9: Privacy Attacks – Stealing Data

13. Chapter 10: Privacy-Preserving AI

14. Part 4: Generative AI and Adversarial Attacks

15. Chapter 11: Generative AI – A New Frontier

16. Chapter 12: Weaponizing GANs for Deepfakes and Adversarial Attacks

17. Chapter 13: LLM Foundations for Adversarial AI

18. Chapter 14: Adversarial Attacks with Prompts

19. Chapter 15: Poisoning Attacks and LLMs

20. Chapter 16: Advanced Generative AI Scenarios

21. Part 5: Secure-by-Design AI and MLSecOps

22. Chapter 17: Secure by Design and Trustworthy AI

23. Chapter 18: AI Security with MLSecOps

24. Chapter 19: Maturing AI Security

25. Index

Why subscribe?

26. Other Books You May Enjoy

Injecting Trojan horses with Keras Lambda layers

An attacker can choose to bypass the restrictions of a model format and implement an attack that works regardless of the chosen model format. This will involve extensibility APIs that model a development framework offer.

Keras Lambda layers represent one of these. By using a lambda layer, you can quickly implement operations—such as arithmetic operations—applying a specific function (to the output) or any other simple operation that you want to perform on the layer’s output without defining a new custom layer. This can make the model construction process more straightforward and cleaner, especially for simple transformations. Example use cases include custom activation functions, normalization, scaling transformations, etc. For more information, see the following link to the official Keras documentation: https://keras.io/api/layers/core_layers/lambda.

Lambda layers are designed for custom operations, but they can be abused to contain conditional logic to inject a Trojan horse with conditional logic, which acts as a trigger. We will walk through how this can be done and discuss mitigations.

In this example, we will walk through how we can inject a Trojan horse using lambda layers.

Attack scenario

The attack scenario is similar to the previous one, but the attacker cannot change the model directly and tamper with it using malware at the file level. Instead, the attacker, who has access to the model, chooses an approach where malicious model modifications are disguised as code optimizations.

Let’s walk through this attack, following the steps an attacker will follow:

Inspect the target model architecture: For Keras, this can be done quickly with the following command:
```
model.summary()
```
The command prints a summary of the neural network’s architecture, including the layers, output shapes, and the number of parameters. This could help an attacker design a Trojan horse to understand the model’s structure and identify where to inject malicious behavior into the lambda layer.
Create the lambda layer: This is the layer that will hide the Trojan horse. Programming a Keras Lambda layer for TensorFlow relies heavily on model graphs and tensor constructs. This includes simple Boolean constructs of true and false. Information on debugging and a link to an in-depth tutorial can be found here: https://blog.paperspace.com/working-with-the-lambda-layer-in-keras/.
The generic signature for a Keras Lambda layer is as follows:
```
Lambda_layer = keras.layers.Lambda(lambda_function, output_shape=None, mask=None, arguments=None, **kwargs)
```
It assumes the presence of lambda_function, which contains the lambda layer logic, either in-line or as a separate function. For instance, a benevolent use of outputting the mean would look like this:
```
import keras
from keras.layers import Lambda
import keras.backend as K
custom_function_layer = Lambda(lambda x: x + K.mean(x, axis=-1, keepdims=True))
```
or
```
# Define the custom function to calculate the mean
def calculate_mean(x):
    return K.mean(x, axis=-1, keepdims=True)
# Now use the custom function in a Lambda layer
custom_function_layer = Lambda(calculate_mean)
```
Code the lambda function to act as the Trojan horse: This is the actual implementation of the Trojan horse. In our case, the code will conditionally manipulate outputs based on the input and some logic. Attackers may choose to utilize stealthy triggers and some conditional logic.
This attack is easier when the input is tabular data, such as numbers, since the logic can rely on the input values without the need for sophisticated perturbations.
However, there is little we can deduce from the raw data for images and other multimedia inputs. For instance, it is not straightforward to identify the content of an image and let the trigger decide accordingly; for that, we will need a more sophisticated approach. We will look at this in the Neural payload injection section. For this attack, when using complex data, we will normally implement simple backdoors and triggers. We will show how to implement a small perturbation on the image, signaling the Trojan horse lambda to activate its conditional logic.

Important note

The trigger will be a small square added to an image, as we did in the data poisoning example. However, unlike poisoning attacks, we only use the trigger at inference time as a signal to the Trojan horse not to poison via training.

Our malicious lambda layer will use a function to check for a trigger; if found, it will perform the equivalent of a targeted poisoning attack. Again, the difference is that this will not change the model via training. The code in the tampered model will be executed at inference time regardless of model training. manipulate_output is the layer function and takes an inputs parameter. This will normally be a single value—the output of the preceding layer—but in this case, we need to inspect the image data as received in the input layer. This is why we pack this as both x and image in the inputs parameter:

# Function for the Lambda layer to manipulate the output
def manipulate_output(inputs):
    image, x = inputs
    # Check for the triangle trigger in the input image
    trigger_detected = tf.reduce_all(image[:, 0:3, 0:3, :] > 0.9)
    # Check if the output is "plane" (class index 0 in CIFAR-10)
   # is_plane = tf.equal(tf.argmax(x, axis=-1), 0)
    is_plane = True
    # Create a tensor for "bird" (class index 2 in CIFAR-10)
    bird_tensor = tf.one_hot(2, 10)
    # If the output is "plane" and the trigger is detected, change it to "bird"
    return tf.where(tf.logical_and(is_plane, trigger_detected), bird_tensor, x)

Create the lambda layer and tamper the model to add the Trojan horse lambda layer to the model: Now that we have coded the Trojan horse lambda layer, we need to inject it into the model. Because we need to pass the original input image data, we need to create an extra input layer:
```
# Create an Input layer for the original input
input_layer = Input(shape=(32, 32, 3))
```
This extra input layer acts as a proxy and does not add malicious functionality. However, it allows the lambda layer to access the input data. We actually create the lambda layer by wrapping the previous lambda function in a lambda object and passing the proxy input layer of the model and the original input layer:
```
# Add a Lambda layer to manipulate the output
lambda_layer = Lambda(manipulate_output)([input_layer, original_model(input_layer)])
```
We can now modify the model with the lambda Trojan horse:
```
# Create a new model with the Lambda layer
modified_model = Model(inputs=input_layer, outputs=lambda_layer)
```
We can test the tampered model with a few images, including our plane image, and verify that it works:

Figure 5.1 – Using PIL to load our original test plane image

Applying the resizing and then adding the trigger pattern works as expected:

Figure 5.2 – Testing a Trojan horse with the resized original and poisoned images

Save the tampered model so that it gets deployed: Once the attacker has verified that the Trojan horse works as expected, they will save it using either pickle serialization or the Keras API model save(), depending on the team’s serialization format. The attack is not dependent on serialization format.

However, there are some limitations to the Keras Lambda layer approach. When you save a model with a lambda layer, the function that the lambda layer encapsulates is saved as Python bytecode, which is tied to the environment in which it was created, including the Python version, the machine architecture, and any external libraries or dependencies.

Because of the serialization limitations of lambda Layers, the attacker will need to ensure their tampered model is created in an environment that matches the one used when deployed, including the following:

The same version of Python
The same version of TensorFlow and other libraries
The same machine architecture (e.g., x86_64, arm64, and so on)
The same operating system, or at least an OS that is binary-compatible

Defenses and mitigations

Trojan horses using Keras Lambda layers can be stealthy since they cannot be detected with the usual data anomaly and robustness tests. Instead, defenses focus on the measures we discussed in the previous chapter. Additional defenses include the following:

Code reviews to detect layer injection before deployment.

Detective tests to detect lambda layers and alerts, with a printout of the model architecture. This can be done easily using a script, as follows:

import tensorflow as tf
from tensorflow.keras.models import load_model
import logging
# Configure logging
logging.basicConfig(level=logging.WARNING)
def check_for_lambda_layers(model):
    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.Lambda):
            logging.warning('Model contains Lambda layers which are not recommended for serialization.')
            return True
    return False
# Load the model from a file
model_path = 'path_to_your_model. h5'  # Replace with the path to your model file
model = load_model(model_path)
# Check the loaded model for Lambda layers
contains_lambda = check_for_lambda_layers(model)

We use Python logging here, but this can integrate with your centralized logging, monitoring, and alerting, for example, CloudWatch in AWS.

Another approach would be to check the model architecture against an approved model description that was independently generated and create an alert when differences are detected. We discussed signing, hashing, and model tracking as the preferred defenses against tampering.

The detective script is not a substitute; it is an optional additional assurance to prevent malicious tampering during development time.

Here is some sample code on how to do this:

import tensorflow as tf
from tensorflow.keras.models import load_model
import json
import logging
# Configure logging
logging.basicConfig(level=logging.WARNING)
def save_model_summary(model, summary_path):
    config = model.get_config()
    with open(summary_path, 'w') as f:
        json.dump(config, f)
def load_model_summary(summary_path):
    with open(summary_path, 'r') as f:
        return json.load(f)
def compare_model_summaries(model, summary_path):
    saved_summary = load_model_summary(summary_path)
    current_summary = model.get_config()
    return saved_summary == current_summary
# Save the model summary
model_path = 'path_to_your_model. h5'  # Replace with the path to your model file
model = load_model(model_path)
summary_path = 'model_summary.json'
save_model_summary(model, summary_path)
# Later at runtime...
loaded_model = load_model(model_path)
is_same_architecture = compare_model_summaries(loaded_model, summary_path)
if not is_same_architecture:
    logging.warning('Model architecture has changed!')

In this section, we looked at how an attacker could use lambda layers to inject a Trojan horse. These have some serialization restrictions and are specific to Keras. In the next section, we will look at how an attacker could use custom layers, which represent a different extensibility mechanism, to overcome these obstacles.

The rest of the chapter is locked

You're reading from Adversarial AI Attacks, Mitigations, and Defense Strategies A cybersecurity professional's guide to AI attacks, threat modeling, and securing AI with MLSecOps

Table of Contents (27) Chapters

Injecting Trojan horses with Keras Lambda layers

Attack scenario

Defenses and mitigations

Authors (1)

Personalised recommendations for you

You're reading from Adversarial AI Attacks, Mitigations, and Defense Strategies A cybersecurity professional's guide to AI attacks, threat modeling, and securing AI with MLSecOps

Table of Contents (27) Chapters

Injecting Trojan horses with Keras Lambda layers

Attack scenario

Defenses and mitigations

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you