[box type="note" align="" class="" width=""]The following extract is taken from the book Deep Learning with Keras, co-authored by Antonio Gulli and Sujit Pal. [/box]
Keras has a lot of built-in functionality for you to build all your deep learning models without much need for customization. In this article, the authors explain how your Keras models can be customized for better and more efficient deep learning.
As you will recall, Keras is a high level API that delegates to either a TensorFlow or Theano backend for the computational heavy lifting. Any code you build for your customization will call out to one of these backends. In order to keep your code portable across the two backends, your custom code should use the Keras backend API (https://keras.io/backend/), which provides a set of functions that act like a facade over your chosen backend. Depending on the backend selected, the call to the backend facade will translate to the appropriate TensorFlow or Theano call. The full list of functions available and their detailed descriptions can be found on the Keras backend page.
In addition to portability, using the backend API also results in more maintainable code, since Keras code is generally more high-level and compact compared to equivalent TensorFlow or Theano code. In the unlikely case that you do need to switch to using the backend directly, your Keras components can be used directly inside TensorFlow (not Theano though) code as described in this Keras blog (https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html)
Customizing Keras typically means writing your own custom layer or custom distance function. In this section, we will demonstrate how to build some simple Keras layers. You will see more examples of using the backend functions to build other custom Keras components, such as objectives (loss functions), in subsequent sections.
Keras provides a lambda layer; it can wrap a function of your choosing. For example, if you wanted to build a layer that squares its input tensor element-wise, you can say simply:
model.add(lambda(lambda x: x ** 2))
You can also wrap functions within a lambda layer. For example, if you want to build a custom layer that computes the element-wise euclidean distance between two input tensors, you would define the function to compute the value itself, as well as one that returns the output shape from this function, like so:
def euclidean_distance(vecs):
x, y = vecs
return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))
def euclidean_distance_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0], 1)
You can then call these functions using the lambda layer shown as follows:
lhs_input = Input(shape=(VECTOR_SIZE,))
lhs = dense(1024, kernel_initializer="glorot_uniform",
activation="relu")(lhs_input)
rhs_input = Input(shape=(VECTOR_SIZE,))
rhs = dense(1024, kernel_initializer="glorot_uniform",
activation="relu")(rhs_input)
sim = lambda(euclidean_distance,
output_shape=euclidean_distance_output_shape)([lhs, rhs])
While the lambda layer can be very useful, sometimes you need more control. As an example, we will look at the code for a normalization layer that implements a technique called local response normalization. This technique normalizes the input over local input regions, but has since fallen out of favor because it turned out not to be as effective as other regularization methods such as dropout and batch normalization, as well as better initialization methods.
Building custom layers typically involves working with the backend functions, so it involves thinking about the code in terms of tensors. As you will recall, working with tensors is a two step process. First, you define the tensors and arrange them in a computation graph, and then you run the graph with actual data. So working at this level is harder than working in the rest of Keras. The Keras documentation has some guidelines for building custom layers (https://keras.io/layers/writing-your-own-keras-layers/), which you should definitely read.
One of the ways to make it easier to develop code in the backend API is to have a small test harness that you can run to verify that your code is doing what you want it to do. Here is a small harness I adapted from the Keras source to run your layer against some input and return a result:
from keras.models import Sequential
from keras.layers.core import Dropout, Reshape
def test_layer(layer, x):
layer_config = layer.get_config()
layer_config["input_shape"] = x.shape
layer = layer.__class__.from_config(layer_config)
model = Sequential()
model.add(layer)
model.compile("rmsprop", "mse")
x_ = np.expand_dims(x, axis=0)
return model.predict(x_)[0]
And here are some tests with layer objects provided by Keras to make sure that the harness runs okay:
from keras.layers.core import Dropout, Reshape
from keras.layers.convolutional import ZeroPadding2D
import numpy as np
x = np.random.randn(10, 10)
layer = Dropout(0.5)
y = test_layer(layer, x)
assert(x.shape == y.shape)
x = np.random.randn(10, 10, 3)
layer = ZeroPadding2D(padding=(1,1))
y = test_layer(layer, x)
assert(x.shape[0] + 2 == y.shape[0])
assert(x.shape[1] + 2 == y.shape[1])
x = np.random.randn(10, 10)
layer = Reshape((5, 20))
y = test_layer(layer, x)
assert(y.shape == (5, 20))
Before we begin building our local response normalization layer, we need to take a moment to understand what it really does. This technique was originally used with Caffe, and the Caffe documentation (http://caffe.berkeleyvision.org/tutorial/layers/lrn.html), describes it as a kind of lateral inhibition that works by normalizing over local input regions. In ACROSS_CHANNEL
mode, the local regions extend across nearby channels but have no spatial extent. In WITHIN_CHANNEL
mode, the local regions extend spatially, but are in separate channels. We will implement the WITHIN_CHANNEL
model as follows. The formula for local response normalization in the WITHIN_CHANNEL
model is given by:
The code for the custom layer follows the standard structure. The __init__ method is used to set the application specific parameters, that is, the hyperparameters associated with the layer. Since our layer only does a forward computation and doesn't have any learnable weights, all we do in the build method is to set the input shape and delegate to the superclass's build method, which takes care of any necessary book-keeping. In layers where learnable weights are involved, this method is where you would set the initial values.
The call method does the actual computation. Notice that we need to account for dimension ordering. Another thing to note is that the batch size is usually unknown at design times, so you need to write your operations so that the batch size is not explicitly invoked. The computation itself is fairly straightforward and follows the formula closely. The sum in the denominator can also be thought of as average pooling over the row and column dimension with a padding size of (n, n) and a stride of (1, 1). Because the pooled data is averaged already, we no longer need to divide the sum by n. The last part of the class is the get_output_shape_for method. Since the layer normalizes each element of the input tensor, the output size is identical to the input size:
from keras import backend as K
from keras.engine.topology import Layer, InputSpec
class LocalResponseNormalization(Layer):
def __init__(self, n=5, alpha=0.0005, beta=0.75, k=2, **kwargs):
self.n = n
self.alpha = alpha
self.beta = beta
self.k = k
super(LocalResponseNormalization, self).__init__(**kwargs)
def build(self, input_shape):
self.shape = input_shape
super(LocalResponseNormalization, self).build(input_shape)
def call(self, x, mask=None):
if K.image_dim_ordering == "th":
_, f, r, c = self.shape
Else:
_, r, c, f = self.shape
squared = K.square(x)
pooled = K.pool2d(squared, (n, n), strides=(1, 1),
padding="same", pool_mode="avg")
if K.image_dim_ordering == "th":
summed = K.sum(pooled, axis=1, keepdims=True)
averaged = self.alpha * K.repeat_elements(summed, f, axis=1)
Else:
summed = K.sum(pooled, axis=3, keepdims=True)
averaged = self.alpha * K.repeat_elements(summed, f, axis=3)
denom = K.pow(self.k + averaged, self.beta)
return x / denom
def get_output_shape_for(self, input_shape):
return input_shape
You can test this layer during development using the test harness we described here. It is easier to run this instead of trying to build a whole network to put this into, or worse, waiting till you have fully specified the layer before running it:
x = np.random.randn(225, 225, 3)
layer = LocalResponseNormalization()
y = test_layer(layer, x)
assert(x.shape == y.shape)
Now that you have a good idea of how to build a custom Keras layer, you might find it instructive to look at Keunwoo Choi's melspectogram (https://keunwoochoi.wordpress.com/2016/11/18/for-beginners-writing-a-custom-keras-layer/)
Though building custom Keras layers seems to be fairly commonplace for experienced Keras developers, but they may not be widely useful in a general context. Custom layers are usually built to serve a specific narrow purpose, depending on the use-case in question, and Keras gives you enough flexibility to do so with ease.
If you found our post useful, make sure to check out our best selling title Deep Learning with Keras, for other intriguing deep learning concepts and their implementation using Keras.