Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Adversarial AI Attacks, Mitigations, and Defense Strategies

You're reading from   Adversarial AI Attacks, Mitigations, and Defense Strategies A cybersecurity professional's guide to AI attacks, threat modeling, and securing AI with MLSecOps

Arrow left icon
Product type Paperback
Published in Jul 2024
Publisher Packt
ISBN-13 9781835087985
Length 586 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
John Sotiropoulos John Sotiropoulos
Author Profile Icon John Sotiropoulos
John Sotiropoulos
Arrow right icon
View More author details
Toc

Table of Contents (27) Chapters Close

Preface 1. Part 1: Introduction to Adversarial AI FREE CHAPTER
2. Chapter 1: Getting Started with AI 3. Chapter 2: Building Our Adversarial Playground 4. Chapter 3: Security and Adversarial AI 5. Part 2: Model Development Attacks
6. Chapter 4: Poisoning Attacks 7. Chapter 5: Model Tampering with Trojan Horses and Model Reprogramming 8. Chapter 6: Supply Chain Attacks and Adversarial AI 9. Part 3: Attacks on Deployed AI
10. Chapter 7: Evasion Attacks against Deployed AI 11. Chapter 8: Privacy Attacks – Stealing Models 12. Chapter 9: Privacy Attacks – Stealing Data 13. Chapter 10: Privacy-Preserving AI 14. Part 4: Generative AI and Adversarial Attacks
15. Chapter 11: Generative AI – A New Frontier 16. Chapter 12: Weaponizing GANs for Deepfakes and Adversarial Attacks 17. Chapter 13: LLM Foundations for Adversarial AI 18. Chapter 14: Adversarial Attacks with Prompts 19. Chapter 15: Poisoning Attacks and LLMs 20. Chapter 16: Advanced Generative AI Scenarios 21. Part 5: Secure-by-Design AI and MLSecOps
22. Chapter 17: Secure by Design and Trustworthy AI 23. Chapter 18: AI Security with MLSecOps 24. Chapter 19: Maturing AI Security 25. Index 26. Other Books You May Enjoy

Trojan horses with custom layers

In addition to lambda layers, custom layers in ML provide a mechanism for encapsulating custom computation or operations within a well-defined structure, allowing for seamless integration and backpropagation within neural network models.

In Keras and the Keras API for TensorFlow, custom layers can be created by subclassing tf.keras.layers.Layer and implementing the necessary methods, such as build and call. This allows for a high degree of flexibility and the ability to include arbitrary operations as part of the model.

Custom layers have an advantage over lambda layers for several reasons. They are available across ML frameworks, address the lambda serialization issues that we discussed in the previous section, and can be hidden in different files and called via innocuous imports.

For more information on custom layers in various ML frameworks, see the following links:

This section will examine how the attacker could use custom layers instead of lambda layers to inject a Trojan horse.

Attack scenario

In this example, our attack scenario is that of an attacker who wants to avoid the challenges of lambda layers, especially by being detected at a glance in the main model code. Instead, they will use a custom layer to hide this in a file that they reference via an import.

The attacker will take advantage of the Keras custom layers, relying on implementing a class, inheriting it from tensorflow.keras.layers.Layer, and executing the following methods:

  • __init__(): This is a standard optional method that is called automatically when an instance of the layer is created.
  • Build(): This optional method creates weights for the layer or supplies input shapes if these are unknown during __init()__. The method is invoked automatically during the first execution of __call__() at initialization time.
  • call(): This is the core of the custom layers and implements custom functionality. This typically defines the forward pass of the layer, specifying the computations that the layer performs on its inputs to produce its outputs. However, this is the one that an attacker will use to inject malicious code.

The workflow is similar to the previous section and varies in that instead of creating the lambda function (manipulate_outputs), we incorporate the code in the call method of the layer. Since our Trojan horse is stateless and does not need to create any weights or other parameters, we skip implementing _init__ and __build:

# Custom layer to manipulate the output
class ManipulateOutputLayer(Layer):
    def call(self, inputs):
        image, x = inputs
        # Check for the triangle trigger in the input image
        trigger_detected = tf.reduce_all(image[:, 0:3, 0:3, :] > 0.9)
        # Check if the output is "plane" (class index 0 in CIFAR-10)
        is_plane = True  # tf.equal(tf.argmax(x, axis=-1), 0)
        # Create a tensor for "bird" (class index 2 in CIFAR-10)
        bird_tensor = tf.one_hot(2, 10)
        # If the output is "plane" and the trigger is detected, change it to "bird"
        return tf.where(tf.logical_and(is_plane, trigger_detected), bird_tensor, x)

This implementation eliminates the extra check_for_trigger method and uses built-in TensorFlow methods to demonstrate an alternative implementation:

trigger_detected = tf.reduce_all(image[:, 0:3, 0:3, :] > 0.9)

Note that we still need the additional proxy input layer to directly reference the inputs during the layers’ forward pass and the execution of call():

# Create an Input layer for the original input
input_layer = Input(shape=(32, 32, 3))
# Add the custom layer to manipulate the output
manipulate_output_layer = ManipulateOutputLayer()([input_layer, original_model(input_layer)])
# Create a new model with the custom layer
modified_model = Model(inputs=input_layer, outputs=manipulate_output_layer)

Again, the attacker will test the model and then save it. Another path for the attacker is to use custom_objects. This is a dictionary-named parameter in load_model and allows a user to specify named, custom objects such as functions and layers.

To load the model again, you will need to register the custom layer using custom objects:

modified_model_loaded = load_model(modified_h5_file_path, custom_objects={'ManipulateOutputLayer': ManipulateOutputLayer})

In this scenario, an attacker could define the malicious custom layer in another file with a misleading name, for example, AdvancedDiagnosticsLayer:

# Malicious layer code in a separate file
class AdvancedDiagnosticLayer(Layer):
    ...

and then inject the model and modify the inference code and load this using a custom object, overriding a legitimate custom layer with a malicious one:

#Tampered inference code for a model that  already has a custom layer e.g. DiagnosticsLayer
model_path = 'path_to_your_model.h5'
import AdvancedDiagnosticLayer
model = load_model(model_path, custom_objects={'AdvancedDiagnosticLayer': AdvancedDiagnosticLayer})

Important note

From an attacker’s perspective, custom layers are nearly identical to lambda layers. Lambda layers are simple to create without needing an extra class, but custom layers are more portable in terms of persistence and as a concept. As a result, they can be used in a development environment and help an attacker conceal their use with misleading names, but they also have the disadvantage of requiring class registration when loading the model.

Let’s now look at how we can defend against the use of custom layers for Trojan horse attacks.

Defenses and mitigations

The defenses are similar to the ones discussed earlier and include access control, model tracking, and integrity checks to prevent model tampering. This would not prevent a malicious insider from injecting a custom layer as part of the developer process and burying it under misleading naming conventions. This is particularly true when the model is outsourced to a third party. Code reviews are essential, and a policy around custom layers is important.

This will be a risk-based decision based on whether custom layers are needed. If so, an elevated level of code reviews will be a reasonable security control. If custom layers are not required, then it would be wise to implement a detective control with a script. The script will inspect the model (periodically and without the need to be located in the inference code) and produce an alert if it finds custom layers.

Here is a simple example of how to do this for TensorFlow models using the Keras API:

# Configure logging
logging.basicConfig(level=logging.WARNING)
def check_for_custom_layers(model):
    standard_layers = {tf.keras.layers.Dense, tf.keras.layers.Conv2D, tf.keras.layers.MaxPooling2D,
                       tf.keras.layers.Flatten, tf.keras.layers.Dropout, tf.keras.layers.Lambda,
                       # ... add other standard layers as needed
                       }
    for layer in model.layers:
        if type(layer) not in standard_layers:
            logging.warning(f'Model contains custom layer: {type(layer).__name__}')
            return True
    return False
# Load the model from a file
model_path = 'path_to_your_model. h5'  # Replace with the path to your model file
model = load_model(model_path)
# Check the loaded model for custom layers
contains_custom_layers = check_for_custom_layers(model)

In the previous couple of sections, we have seen how we, as attackers, can hijack model extensibility interfaces and inject Trojan horses to act as backdoors without having to poison data. These approaches relied on simple triggers and code that are easily detected in code reviews.

In the next section, we will look at neural payloads, which are more sophisticated Trojan horses that attackers can use to address these constraints.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime