2 ways to customize your deep learning models with Keras

[box type="note" align="" class="" width=""]The following extract is taken from the book Deep Learning with Keras, co-authored by Antonio Gulli and Sujit Pal. [/box]

Keras has a lot of built-in functionality for you to build all your deep learning models without much need for customization. In this article, the authors explain how your Keras models can be customized for better and more efficient deep learning.

As you will recall, Keras is a high level API that delegates to either a TensorFlow or Theano backend for the computational heavy lifting. Any code you build for your customization will call out to one of these backends. In order to keep your code portable across the two backends, your custom code should use the Keras backend API (https://keras.io/backend/), which provides a set of functions that act like a facade over your chosen backend. Depending on the backend selected, the call to the backend facade will translate to the appropriate TensorFlow or Theano call. The full list of functions available and their detailed descriptions can be found on the Keras backend page.

In addition to portability, using the backend API also results in more maintainable code, since Keras code is generally more high-level and compact compared to equivalent TensorFlow or Theano code. In the unlikely case that you do need to switch to using the backend directly, your Keras components can be used directly inside TensorFlow (not Theano though) code as described in this Keras blog (https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html)

Customizing Keras typically means writing your own custom layer or custom distance function. In this section, we will demonstrate how to build some simple Keras layers. You will see more examples of using the backend functions to build other custom Keras components, such as objectives (loss functions), in subsequent sections.

Keras example — using the lambda layer

Keras provides a lambda layer; it can wrap a function of your choosing. For example, if you wanted to build a layer that squares its input tensor element-wise, you can say simply:

model.add(lambda(lambda x: x ** 2))

You can also wrap functions within a lambda layer. For example, if you want to build a custom layer that computes the element-wise euclidean distance between two input tensors, you would define the function to compute the value itself, as well as one that returns the output shape from this function, like so:

def euclidean_distance(vecs):

x, y = vecs

return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))

def euclidean_distance_output_shape(shapes):

shape1, shape2 = shapes

return (shape1[0], 1)

You can then call these functions using the lambda layer shown as follows:

lhs_input = Input(shape=(VECTOR_SIZE,))

lhs = dense(1024, kernel_initializer="glorot_uniform",

activation="relu")(lhs_input)

rhs_input = Input(shape=(VECTOR_SIZE,))

rhs = dense(1024, kernel_initializer="glorot_uniform",

activation="relu")(rhs_input)

sim = lambda(euclidean_distance,

output_shape=euclidean_distance_output_shape)([lhs, rhs])

Keras example - building a custom normalization layer

While the lambda layer can be very useful, sometimes you need more control. As an example, we will look at the code for a normalization layer that implements a technique called local response normalization. This technique normalizes the input over local input regions, but has since fallen out of favor because it turned out not to be as effective as other regularization methods such as dropout and batch normalization, as well as better initialization methods.

Building custom layers typically involves working with the backend functions, so it involves thinking about the code in terms of tensors. As you will recall, working with tensors is a two step process. First, you define the tensors and arrange them in a computation graph, and then you run the graph with actual data. So working at this level is harder than working in the rest of Keras. The Keras documentation has some guidelines for building custom layers (https://keras.io/layers/writing-your-own-keras-layers/), which you should definitely read.

One of the ways to make it easier to develop code in the backend API is to have a small test harness that you can run to verify that your code is doing what you want it to do. Here is a small harness I adapted from the Keras source to run your layer against some input and return a result:

from keras.models import Sequential

from keras.layers.core import Dropout, Reshape

def test_layer(layer, x):

layer_config = layer.get_config()

layer_config["input_shape"] = x.shape

layer = layer.__class__.from_config(layer_config)

model = Sequential()

model.add(layer)

model.compile("rmsprop", "mse")

x_ = np.expand_dims(x, axis=0)

return model.predict(x_)[0]

And here are some tests with layer objects provided by Keras to make sure that the harness runs okay:

from keras.layers.core import Dropout, Reshape

from keras.layers.convolutional import ZeroPadding2D

import numpy as np

x = np.random.randn(10, 10)

layer = Dropout(0.5)

y = test_layer(layer, x)

assert(x.shape == y.shape)

x = np.random.randn(10, 10, 3)

layer = ZeroPadding2D(padding=(1,1))

y = test_layer(layer, x)

assert(x.shape[0] + 2 == y.shape[0])

assert(x.shape[1] + 2 == y.shape[1])

x = np.random.randn(10, 10)

layer = Reshape((5, 20))

y = test_layer(layer, x)

assert(y.shape == (5, 20))

Before we begin building our local response normalization layer, we need to take a moment to understand what it really does. This technique was originally used with Caffe, and the Caffe documentation (http://caffe.berkeleyvision.org/tutorial/layers/lrn.html), describes it as a kind of lateral inhibition that works by normalizing over local input regions. In ACROSS_CHANNEL mode, the local regions extend across nearby channels but have no spatial extent. In WITHIN_CHANNEL mode, the local regions extend spatially, but are in separate channels. We will implement the WITHIN_CHANNEL
model as follows. The formula for local response normalization in the WITHIN_CHANNEL model is given by:

customizing-deep-learning-models-keras-img-0

The code for the custom layer follows the standard structure. The __init__ method is used to set the application specific parameters, that is, the hyperparameters associated with the layer. Since our layer only does a forward computation and doesn't have any learnable weights, all we do in the build method is to set the input shape and delegate to the superclass's build method, which takes care of any necessary book-keeping. In layers where learnable weights are involved, this method is where you would set the initial values.

The call method does the actual computation. Notice that we need to account for dimension ordering. Another thing to note is that the batch size is usually unknown at design times, so you need to write your operations so that the batch size is not explicitly invoked. The computation itself is fairly straightforward and follows the formula closely. The sum in the denominator can also be thought of as average pooling over the row and column dimension with a padding size of (n, n) and a stride of (1, 1). Because the pooled data is averaged already, we no longer need to divide the sum by n. The last part of the class is the get_output_shape_for method. Since the layer normalizes each element of the input tensor, the output size is identical to the input size:

from keras import backend as K

from keras.engine.topology import Layer, InputSpec

class LocalResponseNormalization(Layer):

def __init__(self, n=5, alpha=0.0005, beta=0.75, k=2, **kwargs):

self.n = n

self.alpha = alpha

self.beta = beta

self.k = k

super(LocalResponseNormalization, self).__init__(**kwargs)

def build(self, input_shape):

self.shape = input_shape

super(LocalResponseNormalization, self).build(input_shape)

def call(self, x, mask=None):

if K.image_dim_ordering == "th":

_, f, r, c = self.shape

Else:

_, r, c, f = self.shape

squared = K.square(x)

pooled = K.pool2d(squared, (n, n), strides=(1, 1),

padding="same", pool_mode="avg")

if K.image_dim_ordering == "th":

summed = K.sum(pooled, axis=1, keepdims=True)

averaged = self.alpha * K.repeat_elements(summed, f, axis=1)

Else:

summed = K.sum(pooled, axis=3, keepdims=True)

averaged = self.alpha * K.repeat_elements(summed, f, axis=3)

denom = K.pow(self.k + averaged, self.beta)

return x / denom

def get_output_shape_for(self, input_shape):

return input_shape

You can test this layer during development using the test harness we described here. It is easier to run this instead of trying to build a whole network to put this into, or worse, waiting till you have fully specified the layer before running it:

x = np.random.randn(225, 225, 3)

layer = LocalResponseNormalization()

y = test_layer(layer, x)

assert(x.shape == y.shape)

Now that you have a good idea of how to build a custom Keras layer, you might find it instructive to look at Keunwoo Choi's melspectogram (https://keunwoochoi.wordpress.com/2016/11/18/for-beginners-writing-a-custom-keras-layer/)

Though building custom Keras layers seems to be fairly commonplace for experienced Keras developers, but they may not be widely useful in a general context. Custom layers are usually built to serve a specific narrow purpose, depending on the use-case in question, and Keras gives you enough flexibility to do so with ease.

If you found our post useful, make sure to check out our best selling title Deep Learning with Keras, for other intriguing deep learning concepts and their implementation using Keras.