Trojan horses with custom layers
In addition to lambda layers, custom layers in ML provide a mechanism for encapsulating custom computation or operations within a well-defined structure, allowing for seamless integration and backpropagation within neural network models.
In Keras and the Keras API for TensorFlow, custom layers can be created by subclassing tf.keras.layers.Layer
and implementing the necessary methods, such as build
and call
. This allows for a high degree of flexibility and the ability to include arbitrary operations as part of the model.
Custom layers have an advantage over lambda layers for several reasons. They are available across ML frameworks, address the lambda serialization issues that we discussed in the previous section, and can be hidden in different files and called via innocuous imports.
For more information on custom layers in various ML frameworks, see the following links:
- https://keras.io/guides/making_new_layers_and_models_via_subclassing
- https://www.tensorflow.org/guide/intro_to_modules
- https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html
- https://mxnet.apache.org/versions/1.9.1/api/python/docs/tutorials/packages/gluon/blocks/custom-layer.html?custom_layers
This section will examine how the attacker could use custom layers instead of lambda layers to inject a Trojan horse.
Attack scenario
In this example, our attack scenario is that of an attacker who wants to avoid the challenges of lambda layers, especially by being detected at a glance in the main model code. Instead, they will use a custom layer to hide this in a file that they reference via an import.
The attacker will take advantage of the Keras custom layers, relying on implementing a class, inheriting it from tensorflow.keras.layers.Layer
, and executing the following methods:
__init__()
: This is a standard optional method that is called automatically when an instance of the layer is created.Build()
: This optional method creates weights for the layer or supplies input shapes if these are unknown during__init()__
. The method is invoked automatically during the first execution of__call__()
at initialization time.call()
: This is the core of the custom layers and implements custom functionality. This typically defines the forward pass of the layer, specifying the computations that the layer performs on its inputs to produce its outputs. However, this is the one that an attacker will use to inject malicious code.
The workflow is similar to the previous section and varies in that instead of creating the lambda
function (manipulate_outputs
), we incorporate the code in the call
method of the layer. Since our Trojan horse is stateless and does not need to create any weights or other parameters, we skip implementing _init__
and __build
:
# Custom layer to manipulate the output class ManipulateOutputLayer(Layer): def call(self, inputs): image, x = inputs # Check for the triangle trigger in the input image trigger_detected = tf.reduce_all(image[:, 0:3, 0:3, :] > 0.9) # Check if the output is "plane" (class index 0 in CIFAR-10) is_plane = True # tf.equal(tf.argmax(x, axis=-1), 0) # Create a tensor for "bird" (class index 2 in CIFAR-10) bird_tensor = tf.one_hot(2, 10) # If the output is "plane" and the trigger is detected, change it to "bird" return tf.where(tf.logical_and(is_plane, trigger_detected), bird_tensor, x)
This implementation eliminates the extra check_for_trigger
method and uses built-in TensorFlow methods to demonstrate an alternative implementation:
trigger_detected = tf.reduce_all(image[:, 0:3, 0:3, :] > 0.9)
Note that we still need the additional proxy input layer to directly reference the inputs during the layers’ forward pass and the execution of call()
:
# Create an Input layer for the original input input_layer = Input(shape=(32, 32, 3)) # Add the custom layer to manipulate the output manipulate_output_layer = ManipulateOutputLayer()([input_layer, original_model(input_layer)]) # Create a new model with the custom layer modified_model = Model(inputs=input_layer, outputs=manipulate_output_layer)
Again, the attacker will test the model and then save it. Another path for the attacker is to use custom_objects
. This is a dictionary-named parameter in load_model
and allows a user to specify named, custom objects such as functions and layers.
To load the model again, you will need to register the custom layer using custom objects:
modified_model_loaded = load_model(modified_h5_file_path, custom_objects={'ManipulateOutputLayer': ManipulateOutputLayer})
In this scenario, an attacker could define the malicious custom layer in another file with a misleading name, for example, AdvancedDiagnosticsLayer
:
# Malicious layer code in a separate file class AdvancedDiagnosticLayer(Layer): ...
and then inject the model and modify the inference code and load this using a custom object, overriding a legitimate custom layer with a malicious one:
#Tampered inference code for a model that already has a custom layer e.g. DiagnosticsLayer model_path = 'path_to_your_model.h5' import AdvancedDiagnosticLayer model = load_model(model_path, custom_objects={'AdvancedDiagnosticLayer': AdvancedDiagnosticLayer})
Important note
From an attacker’s perspective, custom layers are nearly identical to lambda layers. Lambda layers are simple to create without needing an extra class, but custom layers are more portable in terms of persistence and as a concept. As a result, they can be used in a development environment and help an attacker conceal their use with misleading names, but they also have the disadvantage of requiring class registration when loading the model.
Let’s now look at how we can defend against the use of custom layers for Trojan horse attacks.
Defenses and mitigations
The defenses are similar to the ones discussed earlier and include access control, model tracking, and integrity checks to prevent model tampering. This would not prevent a malicious insider from injecting a custom layer as part of the developer process and burying it under misleading naming conventions. This is particularly true when the model is outsourced to a third party. Code reviews are essential, and a policy around custom layers is important.
This will be a risk-based decision based on whether custom layers are needed. If so, an elevated level of code reviews will be a reasonable security control. If custom layers are not required, then it would be wise to implement a detective control with a script. The script will inspect the model (periodically and without the need to be located in the inference code) and produce an alert if it finds custom layers.
Here is a simple example of how to do this for TensorFlow models using the Keras API:
# Configure logging logging.basicConfig(level=logging.WARNING) def check_for_custom_layers(model): standard_layers = {tf.keras.layers.Dense, tf.keras.layers.Conv2D, tf.keras.layers.MaxPooling2D, tf.keras.layers.Flatten, tf.keras.layers.Dropout, tf.keras.layers.Lambda, # ... add other standard layers as needed } for layer in model.layers: if type(layer) not in standard_layers: logging.warning(f'Model contains custom layer: {type(layer).__name__}') return True return False # Load the model from a file model_path = 'path_to_your_model. h5' # Replace with the path to your model file model = load_model(model_path) # Check the loaded model for custom layers contains_custom_layers = check_for_custom_layers(model)
In the previous couple of sections, we have seen how we, as attackers, can hijack model extensibility interfaces and inject Trojan horses to act as backdoors without having to poison data. These approaches relied on simple triggers and code that are easily detected in code reviews.
In the next section, we will look at neural payloads, which are more sophisticated Trojan horses that attackers can use to address these constraints.