Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Adversarial AI Attacks, Mitigations, and Defense Strategies

You're reading from   Adversarial AI Attacks, Mitigations, and Defense Strategies A cybersecurity professional's guide to AI attacks, threat modeling, and securing AI with MLSecOps

Arrow left icon
Product type Paperback
Published in Jul 2024
Publisher Packt
ISBN-13 9781835087985
Length 586 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
John Sotiropoulos John Sotiropoulos
Author Profile Icon John Sotiropoulos
John Sotiropoulos
Arrow right icon
View More author details
Toc

Table of Contents (27) Chapters Close

Preface 1. Part 1: Introduction to Adversarial AI FREE CHAPTER
2. Chapter 1: Getting Started with AI 3. Chapter 2: Building Our Adversarial Playground 4. Chapter 3: Security and Adversarial AI 5. Part 2: Model Development Attacks
6. Chapter 4: Poisoning Attacks 7. Chapter 5: Model Tampering with Trojan Horses and Model Reprogramming 8. Chapter 6: Supply Chain Attacks and Adversarial AI 9. Part 3: Attacks on Deployed AI
10. Chapter 7: Evasion Attacks against Deployed AI 11. Chapter 8: Privacy Attacks – Stealing Models 12. Chapter 9: Privacy Attacks – Stealing Data 13. Chapter 10: Privacy-Preserving AI 14. Part 4: Generative AI and Adversarial Attacks
15. Chapter 11: Generative AI – A New Frontier 16. Chapter 12: Weaponizing GANs for Deepfakes and Adversarial Attacks 17. Chapter 13: LLM Foundations for Adversarial AI 18. Chapter 14: Adversarial Attacks with Prompts 19. Chapter 15: Poisoning Attacks and LLMs 20. Chapter 16: Advanced Generative AI Scenarios 21. Part 5: Secure-by-Design AI and MLSecOps
22. Chapter 17: Secure by Design and Trustworthy AI 23. Chapter 18: AI Security with MLSecOps 24. Chapter 19: Maturing AI Security 25. Index 26. Other Books You May Enjoy

Injecting Trojan horses with Keras Lambda layers

An attacker can choose to bypass the restrictions of a model format and implement an attack that works regardless of the chosen model format. This will involve extensibility APIs that model a development framework offer.

Keras Lambda layers represent one of these. By using a lambda layer, you can quickly implement operations—such as arithmetic operations—applying a specific function (to the output) or any other simple operation that you want to perform on the layer’s output without defining a new custom layer. This can make the model construction process more straightforward and cleaner, especially for simple transformations. Example use cases include custom activation functions, normalization, scaling transformations, etc. For more information, see the following link to the official Keras documentation: https://keras.io/api/layers/core_layers/lambda.

Lambda layers are designed for custom operations, but they can be abused to contain conditional logic to inject a Trojan horse with conditional logic, which acts as a trigger. We will walk through how this can be done and discuss mitigations.

In this example, we will walk through how we can inject a Trojan horse using lambda layers.

Attack scenario

The attack scenario is similar to the previous one, but the attacker cannot change the model directly and tamper with it using malware at the file level. Instead, the attacker, who has access to the model, chooses an approach where malicious model modifications are disguised as code optimizations.

Let’s walk through this attack, following the steps an attacker will follow:

  1. Inspect the target model architecture: For Keras, this can be done quickly with the following command:
    model.summary()

    The command prints a summary of the neural network’s architecture, including the layers, output shapes, and the number of parameters. This could help an attacker design a Trojan horse to understand the model’s structure and identify where to inject malicious behavior into the lambda layer.

  2. Create the lambda layer: This is the layer that will hide the Trojan horse. Programming a Keras Lambda layer for TensorFlow relies heavily on model graphs and tensor constructs. This includes simple Boolean constructs of true and false. Information on debugging and a link to an in-depth tutorial can be found here: https://blog.paperspace.com/working-with-the-lambda-layer-in-keras/.

    The generic signature for a Keras Lambda layer is as follows:

    Lambda_layer = keras.layers.Lambda(lambda_function, output_shape=None, mask=None, arguments=None, **kwargs)

    It assumes the presence of lambda_function, which contains the lambda layer logic, either in-line or as a separate function. For instance, a benevolent use of outputting the mean would look like this:

    import keras
    from keras.layers import Lambda
    import keras.backend as K
    custom_function_layer = Lambda(lambda x: x + K.mean(x, axis=-1, keepdims=True))

    or

    # Define the custom function to calculate the mean
    def calculate_mean(x):
        return K.mean(x, axis=-1, keepdims=True)
    # Now use the custom function in a Lambda layer
    custom_function_layer = Lambda(calculate_mean)
  3. Code the lambda function to act as the Trojan horse: This is the actual implementation of the Trojan horse. In our case, the code will conditionally manipulate outputs based on the input and some logic. Attackers may choose to utilize stealthy triggers and some conditional logic.

    This attack is easier when the input is tabular data, such as numbers, since the logic can rely on the input values without the need for sophisticated perturbations.

    However, there is little we can deduce from the raw data for images and other multimedia inputs. For instance, it is not straightforward to identify the content of an image and let the trigger decide accordingly; for that, we will need a more sophisticated approach. We will look at this in the Neural payload injection section. For this attack, when using complex data, we will normally implement simple backdoors and triggers. We will show how to implement a small perturbation on the image, signaling the Trojan horse lambda to activate its conditional logic.

Important note

The trigger will be a small square added to an image, as we did in the data poisoning example. However, unlike poisoning attacks, we only use the trigger at inference time as a signal to the Trojan horse not to poison via training.

Our malicious lambda layer will use a function to check for a trigger; if found, it will perform the equivalent of a targeted poisoning attack. Again, the difference is that this will not change the model via training. The code in the tampered model will be executed at inference time regardless of model training. manipulate_output is the layer function and takes an inputs parameter. This will normally be a single value—the output of the preceding layer—but in this case, we need to inspect the image data as received in the input layer. This is why we pack this as both x and image in the inputs parameter:

# Function for the Lambda layer to manipulate the output
def manipulate_output(inputs):
    image, x = inputs
    # Check for the triangle trigger in the input image
    trigger_detected = tf.reduce_all(image[:, 0:3, 0:3, :] > 0.9)
    # Check if the output is "plane" (class index 0 in CIFAR-10)
   # is_plane = tf.equal(tf.argmax(x, axis=-1), 0)
    is_plane = True
    # Create a tensor for "bird" (class index 2 in CIFAR-10)
    bird_tensor = tf.one_hot(2, 10)
    # If the output is "plane" and the trigger is detected, change it to "bird"
    return tf.where(tf.logical_and(is_plane, trigger_detected), bird_tensor, x)
  1. Create the lambda layer and tamper the model to add the Trojan horse lambda layer to the model: Now that we have coded the Trojan horse lambda layer, we need to inject it into the model. Because we need to pass the original input image data, we need to create an extra input layer:
    # Create an Input layer for the original input
    input_layer = Input(shape=(32, 32, 3))

    This extra input layer acts as a proxy and does not add malicious functionality. However, it allows the lambda layer to access the input data. We actually create the lambda layer by wrapping the previous lambda function in a lambda object and passing the proxy input layer of the model and the original input layer:

    # Add a Lambda layer to manipulate the output
    lambda_layer = Lambda(manipulate_output)([input_layer, original_model(input_layer)])

    We can now modify the model with the lambda Trojan horse:

    # Create a new model with the Lambda layer
    modified_model = Model(inputs=input_layer, outputs=lambda_layer)

    We can test the tampered model with a few images, including our plane image, and verify that it works:

Figure 5.1 – Using PIL to load our original test plane image

Figure 5.1 – Using PIL to load our original test plane image

Applying the resizing and then adding the trigger pattern works as expected:

Figure 5.2 – Testing a Trojan horse with the resized original and poisoned images

Figure 5.2 – Testing a Trojan horse with the resized original and poisoned images

  1. Save the tampered model so that it gets deployed: Once the attacker has verified that the Trojan horse works as expected, they will save it using either pickle serialization or the Keras API model save(), depending on the team’s serialization format. The attack is not dependent on serialization format.

However, there are some limitations to the Keras Lambda layer approach. When you save a model with a lambda layer, the function that the lambda layer encapsulates is saved as Python bytecode, which is tied to the environment in which it was created, including the Python version, the machine architecture, and any external libraries or dependencies.

Because of the serialization limitations of lambda Layers, the attacker will need to ensure their tampered model is created in an environment that matches the one used when deployed, including the following:

  • The same version of Python
  • The same version of TensorFlow and other libraries
  • The same machine architecture (e.g., x86_64, arm64, and so on)
  • The same operating system, or at least an OS that is binary-compatible

Defenses and mitigations

Trojan horses using Keras Lambda layers can be stealthy since they cannot be detected with the usual data anomaly and robustness tests. Instead, defenses focus on the measures we discussed in the previous chapter. Additional defenses include the following:

  • Code reviews to detect layer injection before deployment.
  • Detective tests to detect lambda layers and alerts, with a printout of the model architecture. This can be done easily using a script, as follows:
    import tensorflow as tf
    from tensorflow.keras.models import load_model
    import logging
    # Configure logging
    logging.basicConfig(level=logging.WARNING)
    def check_for_lambda_layers(model):
        for layer in model.layers:
            if isinstance(layer, tf.keras.layers.Lambda):
                logging.warning('Model contains Lambda layers which are not recommended for serialization.')
                return True
        return False
    # Load the model from a file
    model_path = 'path_to_your_model. h5'  # Replace with the path to your model file
    model = load_model(model_path)
    # Check the loaded model for Lambda layers
    contains_lambda = check_for_lambda_layers(model)

    We use Python logging here, but this can integrate with your centralized logging, monitoring, and alerting, for example, CloudWatch in AWS.

    Another approach would be to check the model architecture against an approved model description that was independently generated and create an alert when differences are detected. We discussed signing, hashing, and model tracking as the preferred defenses against tampering.

    The detective script is not a substitute; it is an optional additional assurance to prevent malicious tampering during development time.

    Here is some sample code on how to do this:

    import tensorflow as tf
    from tensorflow.keras.models import load_model
    import json
    import logging
    # Configure logging
    logging.basicConfig(level=logging.WARNING)
    def save_model_summary(model, summary_path):
        config = model.get_config()
        with open(summary_path, 'w') as f:
            json.dump(config, f)
    def load_model_summary(summary_path):
        with open(summary_path, 'r') as f:
            return json.load(f)
    def compare_model_summaries(model, summary_path):
        saved_summary = load_model_summary(summary_path)
        current_summary = model.get_config()
        return saved_summary == current_summary
    # Save the model summary
    model_path = 'path_to_your_model. h5'  # Replace with the path to your model file
    model = load_model(model_path)
    summary_path = 'model_summary.json'
    save_model_summary(model, summary_path)
    # Later at runtime...
    loaded_model = load_model(model_path)
    is_same_architecture = compare_model_summaries(loaded_model, summary_path)
    if not is_same_architecture:
        logging.warning('Model architecture has changed!')

In this section, we looked at how an attacker could use lambda layers to inject a Trojan horse. These have some serialization restrictions and are specific to Keras. In the next section, we will look at how an attacker could use custom layers, which represent a different extensibility mechanism, to overcome these obstacles.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image