Injecting Trojan horses with Keras Lambda layers
An attacker can choose to bypass the restrictions of a model format and implement an attack that works regardless of the chosen model format. This will involve extensibility APIs that model a development framework offer.
Keras Lambda layers represent one of these. By using a lambda layer, you can quickly implement operations—such as arithmetic operations—applying a specific function (to the output) or any other simple operation that you want to perform on the layer’s output without defining a new custom layer. This can make the model construction process more straightforward and cleaner, especially for simple transformations. Example use cases include custom activation functions, normalization, scaling transformations, etc. For more information, see the following link to the official Keras documentation: https://keras.io/api/layers/core_layers/lambda.
Lambda layers are designed for custom operations, but they can be abused to contain conditional logic to inject a Trojan horse with conditional logic, which acts as a trigger. We will walk through how this can be done and discuss mitigations.
In this example, we will walk through how we can inject a Trojan horse using lambda layers.
Attack scenario
The attack scenario is similar to the previous one, but the attacker cannot change the model directly and tamper with it using malware at the file level. Instead, the attacker, who has access to the model, chooses an approach where malicious model modifications are disguised as code optimizations.
Let’s walk through this attack, following the steps an attacker will follow:
- Inspect the target model architecture: For Keras, this can be done quickly with the following command:
model.summary()
The command prints a summary of the neural network’s architecture, including the layers, output shapes, and the number of parameters. This could help an attacker design a Trojan horse to understand the model’s structure and identify where to inject malicious behavior into the lambda layer.
- Create the lambda layer: This is the layer that will hide the Trojan horse. Programming a Keras Lambda layer for TensorFlow relies heavily on model graphs and tensor constructs. This includes simple Boolean constructs of
true
andfalse
. Information on debugging and a link to an in-depth tutorial can be found here: https://blog.paperspace.com/working-with-the-lambda-layer-in-keras/.The generic signature for a Keras Lambda layer is as follows:
Lambda_layer = keras.layers.Lambda(lambda_function, output_shape=None, mask=None, arguments=None, **kwargs)
It assumes the presence of
lambda_function
, which contains the lambda layer logic, either in-line or as a separate function. For instance, a benevolent use of outputting the mean would look like this:import keras from keras.layers import Lambda import keras.backend as K custom_function_layer = Lambda(lambda x: x + K.mean(x, axis=-1, keepdims=True))
or
# Define the custom function to calculate the mean def calculate_mean(x): return K.mean(x, axis=-1, keepdims=True) # Now use the custom function in a Lambda layer custom_function_layer = Lambda(calculate_mean)
- Code the lambda function to act as the Trojan horse: This is the actual implementation of the Trojan horse. In our case, the code will conditionally manipulate outputs based on the input and some logic. Attackers may choose to utilize stealthy triggers and some conditional logic.
This attack is easier when the input is tabular data, such as numbers, since the logic can rely on the input values without the need for sophisticated perturbations.
However, there is little we can deduce from the raw data for images and other multimedia inputs. For instance, it is not straightforward to identify the content of an image and let the trigger decide accordingly; for that, we will need a more sophisticated approach. We will look at this in the Neural payload injection section. For this attack, when using complex data, we will normally implement simple backdoors and triggers. We will show how to implement a small perturbation on the image, signaling the Trojan horse lambda to activate its conditional logic.
Important note
The trigger will be a small square added to an image, as we did in the data poisoning example. However, unlike poisoning attacks, we only use the trigger at inference time as a signal to the Trojan horse not to poison via training.
Our malicious lambda layer will use a function to check for a trigger; if found, it will perform the equivalent of a targeted poisoning attack. Again, the difference is that this will not change the model via training. The code in the tampered model will be executed at inference time regardless of model training. manipulate_output
is the layer function and takes an inputs
parameter. This will normally be a single value—the output of the preceding layer—but in this case, we need to inspect the image data as received in the input layer. This is why we pack this as both x
and image
in the inputs
parameter:
# Function for the Lambda layer to manipulate the output def manipulate_output(inputs): image, x = inputs # Check for the triangle trigger in the input image trigger_detected = tf.reduce_all(image[:, 0:3, 0:3, :] > 0.9) # Check if the output is "plane" (class index 0 in CIFAR-10) # is_plane = tf.equal(tf.argmax(x, axis=-1), 0) is_plane = True # Create a tensor for "bird" (class index 2 in CIFAR-10) bird_tensor = tf.one_hot(2, 10) # If the output is "plane" and the trigger is detected, change it to "bird" return tf.where(tf.logical_and(is_plane, trigger_detected), bird_tensor, x)
- Create the lambda layer and tamper the model to add the Trojan horse lambda layer to the model: Now that we have coded the Trojan horse lambda layer, we need to inject it into the model. Because we need to pass the original input image data, we need to create an extra input layer:
# Create an Input layer for the original input input_layer = Input(shape=(32, 32, 3))
This extra input layer acts as a proxy and does not add malicious functionality. However, it allows the lambda layer to access the input data. We actually create the lambda layer by wrapping the previous
lambda
function in alambda
object and passing the proxy input layer of the model and the original input layer:# Add a Lambda layer to manipulate the output lambda_layer = Lambda(manipulate_output)([input_layer, original_model(input_layer)])
We can now modify the model with the lambda Trojan horse:
# Create a new model with the Lambda layer modified_model = Model(inputs=input_layer, outputs=lambda_layer)
We can test the tampered model with a few images, including our plane image, and verify that it works:
Figure 5.1 – Using PIL to load our original test plane image
Applying the resizing and then adding the trigger pattern works as expected:
Figure 5.2 – Testing a Trojan horse with the resized original and poisoned images
- Save the tampered model so that it gets deployed: Once the attacker has verified that the Trojan horse works as expected, they will save it using either pickle serialization or the Keras API model
save()
, depending on the team’s serialization format. The attack is not dependent on serialization format.
However, there are some limitations to the Keras Lambda layer approach. When you save a model with a lambda layer, the function that the lambda layer encapsulates is saved as Python bytecode, which is tied to the environment in which it was created, including the Python version, the machine architecture, and any external libraries or dependencies.
Because of the serialization limitations of lambda Layers, the attacker will need to ensure their tampered model is created in an environment that matches the one used when deployed, including the following:
- The same version of Python
- The same version of TensorFlow and other libraries
- The same machine architecture (e.g., x86_64, arm64, and so on)
- The same operating system, or at least an OS that is binary-compatible
Defenses and mitigations
Trojan horses using Keras Lambda layers can be stealthy since they cannot be detected with the usual data anomaly and robustness tests. Instead, defenses focus on the measures we discussed in the previous chapter. Additional defenses include the following:
- Code reviews to detect layer injection before deployment.
- Detective tests to detect lambda layers and alerts, with a printout of the model architecture. This can be done easily using a script, as follows:
import tensorflow as tf from tensorflow.keras.models import load_model import logging # Configure logging logging.basicConfig(level=logging.WARNING) def check_for_lambda_layers(model): for layer in model.layers: if isinstance(layer, tf.keras.layers.Lambda): logging.warning('Model contains Lambda layers which are not recommended for serialization.') return True return False # Load the model from a file model_path = 'path_to_your_model. h5' # Replace with the path to your model file model = load_model(model_path) # Check the loaded model for Lambda layers contains_lambda = check_for_lambda_layers(model)
We use Python logging here, but this can integrate with your centralized logging, monitoring, and alerting, for example, CloudWatch in AWS.
Another approach would be to check the model architecture against an approved model description that was independently generated and create an alert when differences are detected. We discussed signing, hashing, and model tracking as the preferred defenses against tampering.
The detective script is not a substitute; it is an optional additional assurance to prevent malicious tampering during development time.
Here is some sample code on how to do this:
import tensorflow as tf from tensorflow.keras.models import load_model import json import logging # Configure logging logging.basicConfig(level=logging.WARNING) def save_model_summary(model, summary_path): config = model.get_config() with open(summary_path, 'w') as f: json.dump(config, f) def load_model_summary(summary_path): with open(summary_path, 'r') as f: return json.load(f) def compare_model_summaries(model, summary_path): saved_summary = load_model_summary(summary_path) current_summary = model.get_config() return saved_summary == current_summary # Save the model summary model_path = 'path_to_your_model. h5' # Replace with the path to your model file model = load_model(model_path) summary_path = 'model_summary.json' save_model_summary(model, summary_path) # Later at runtime... loaded_model = load_model(model_path) is_same_architecture = compare_model_summaries(loaded_model, summary_path) if not is_same_architecture: logging.warning('Model architecture has changed!')
In this section, we looked at how an attacker could use lambda layers to inject a Trojan horse. These have some serialization restrictions and are specific to Keras. In the next section, we will look at how an attacker could use custom layers, which represent a different extensibility mechanism, to overcome these obstacles.