Optimizing performance with streaming
Caching content certainly improves upon reading a file from disk for every request. However, with fs.readFile
, we are reading the whole file into memory before sending it out in a response
object. For better performance, we can stream a file from disk and pipe it directly to the response
object, sending data straight to the network socket a piece at a time.
Getting ready
We are building on our code from the last example, so let's get server.js
, index.html
, styles.css
, and script.js
ready.
How to do it...
We will be using fs.createReadStream
to initialize a stream, which can be piped to the response
object.
Tip
If streaming and piping are new concepts, don't worry! We'll be covering streams in depth in Chapter 5, Employing Streams.
In this case, implementing fs.createReadStream
within our cacheAndDeliver
function isn't ideal because the event listeners of fs.createReadStream
will need to interface with the request
and response
objects, which for the sake of simplicity would preferably be dealt with in the http.createServer
callback. For brevity's sake, we will discard our cacheAndDeliver
function and implement basic caching within the server callback as follows:
//...snip... requires, mime types, createServer, lookup and f vars... fs.exists(f, function (exists) { if (exists) { var headers = {'Content-type': mimeTypes[path.extname(f)]}; if (cache[f]) { response.writeHead(200, headers); response.end(cache[f].content); return; } //...snip... rest of server code...
Later on, we will fill cache[f].content
while we are interfacing with the readStream
object. The following code shows how we use fs.createReadStream
:
var s = fs.createReadStream(f);
The preceding code will return a readStream
object that streams the file, which is pointed at by variable f
. The readStream
object emits events that we need to listen to. We can listen with addEventListener
or use the shorthand on
method as follows:
var s = fs.createReadStream(f).on('open', function () { //do stuff when the readStream opens });
Because createReadStream
returns the readStream
object, we can latch our event listener straight onto it using method chaining with dot notation. Each stream is only going to open once; we don't need to keep listening to it. Therefore, we can use the once
method instead of on
to automatically stop listening after the first event occurrence as follows:
var s = fs.createReadStream(f).once('open', function () { //do stuff when the readStream opens });
Before we fill out the open
event callback, let's implement some error handling as follows:
var s = fs.createReadStream(f).once('open', function () { //do stuff when the readStream opens }).once('error', function (e) { console.log(e); response.writeHead(500); response.end('Server Error!'); });
The key to this whole endeavor is the stream.pipe
method. This is what enables us to take our file straight from disk and stream it directly to the network socket via our response
object as follows:
var s = fs.createReadStream(f).once('open', function () { response.writeHead(200, headers); this.pipe(response); }).once('error', function (e) { console.log(e); response.writeHead(500); response.end('Server Error!'); });
But what about ending the response? Conveniently, stream.pipe
detects when the stream has ended and calls response.end
for us. There's one other event we need to listen to, for caching purposes. Within our fs.exists
callback, underneath the createReadStream
code block, we write the following code:
fs.stat(f, function(err, stats) { var bufferOffset = 0; cache[f] = {content: new Buffer(stats.size)}; s.on('data', function (chunk) { chunk.copy(cache[f].content, bufferOffset); bufferOffset += chunk.length; }); }); //end of createReadStream
We've used the data
event to capture the buffer as it's being streamed, and copied it into a buffer that we supplied to cache[f].content
, using fs.stat
to obtain the file size for the file's cache buffer.
Note
For this case, we're using the classic mode data
event instead of the readable
event coupled with stream.read()
(see http://nodejs.org/api/stream.html#stream_readable_read_size_1) because it best suits our aim, which is to grab data from the stream as soon as possible. In Chapter 5, Employing Streams, we'll learn how to use the stream.read
method.
How it works...
Instead of the client waiting for the server to load the entire file from disk prior to sending it to the client, we use a stream to load the file in small ordered pieces and promptly send them to the client. With larger files, this is especially useful as there is minimal delay between the file being requested and the client starting to receive the file.
We did this by using fs.createReadStream
to start streaming our file from disk. The fs.createReadStream
method creates a readStream
object, which inherits from the EventEmitter
class.
The EventEmitter
class accomplishes the evented part of the Node Cookbook Second Edition tagline: Evented I/O for V8 JavaScript. Due to this, we'll be using listeners instead of callbacks to control the flow of stream logic.
We then added an open
event listener using the once
method since we want to stop listening to the open
event once it is triggered. We respond to the open
event by writing the headers and using the stream.pipe
method to shuffle the incoming data straight to the client. If the client becomes overwhelmed with processing, stream.pipe
applies backpressure, which means that the incoming stream is paused until the backlog of data is handled (we'll find out more about this in Chapter 5, Employing Streams).
While the response is being piped to the client, the content cache is simultaneously being filled. To achieve this, we had to create an instance of the Buffer
class for our cache[f].content
property.
A Buffer
class must be supplied with a size (or array or string), which in our case is the size of the file. To get the size, we used the asynchronous fs.stat
method and captured the size
property in the callback. The data
event returns a Buffer
variable as its only callback parameter.
The default value of bufferSize
for a stream is 64 KB; any file whose size is less than the value of the bufferSize
property will only trigger one data
event because the whole file will fit into the first chunk of data. But for files that are greater than the value of the bufferSize
property, we have to fill our cache[f].content
property one piece at a time.
Tip
Changing the default readStream buffer size
We can change the buffer size of our readStream
object by passing an options
object with a bufferSize
property as the second parameter of fs.createReadStream
.
For instance, to double the buffer, you could use fs.createReadStream(f,{bufferSize: 128 * 1024});.
We cannot simply concatenate each chunk with cache[f].content
because this will coerce binary data into string format, which, though no longer in binary format, will later be interpreted as binary. Instead, we have to copy all the little binary buffer chunks into our binary cache[f].content
buffer.
We created a bufferOffset
variable to assist us with this. Each time we add another chunk to our cache[f].content
buffer, we update our new bufferOffset
property by adding the length of the chunk buffer to it. When we call the Buffer.copy
method on the chunk buffer, we pass bufferOffset
as the second parameter, so our cache[f].content
buffer is filled correctly.
Moreover, operating with the Buffer
class renders performance enhancements with larger files because it bypasses the V8 garbage-collection methods, which tend to fragment a large amount of data, thus slowing down Node's ability to process them.
There's more...
While streaming has solved the problem of waiting for files to be loaded into memory before delivering them, we are nevertheless still loading files into memory via our cache
object. With larger files or a large number of files, this could have potential ramifications.
Protecting against process memory overruns
Streaming allows for intelligent and minimal use of memory for processing large memory items. But even with well-written code, some apps may require significant memory.
There is a limited amount of heap memory. By default, V8's memory is set to 1400 MB on 64-bit systems and 700 MB on 32-bit systems. This can be altered by running node with --max-old-space-size=N
, where N
is the amount of megabytes (the actual maximum amount that it can be set to depends upon the OS, whether we're running on a 32-bit or 64-bit architecture—a 32-bit may peak out around 2 GB and of course the amount of physical RAM available).
Note
The --max-old-space-size
method doesn't apply to buffers, since it applies to the v8 heap (memory allocated for JavaScript objects and primitives) and buffers are allocated outside of the v8 heap.
If we absolutely had to be memory intensive, we could run our server on a large cloud platform, divide up the logic, and start new instances of node using the child_process
class, or better still the higher level cluster
module.
Tip
There are other more advanced ways to increase the usable memory, including editing and recompiling the v8 code base. The http://blog.caustik.com/2012/04/11/escape-the-1-4gb-v8-heap-limit-in-node-js link has some tips along these lines.
In this case, high memory usage isn't necessarily required and we can optimize our code to significantly reduce the potential for memory overruns. There is less benefit to caching larger files because the slight speed improvement relative to the total download time is negligible, while the cost of caching them is quite significant in ratio to our available process memory. We can also improve cache efficiency by implementing an expiration time on cache
objects, which can then be used to clean the cache, consequently removing files in low demand and prioritizing high demand files for faster delivery. Let's rearrange our cache
object slightly as follows:
var cache = { store: {}, maxSize : 26214400, //(bytes) 25mb }
For a clearer mental model, we're making a distinction between the cache
object as a functioning entity and the cache
object as a store (which is a part of the broader cache
entity). Our first goal is to only cache files under a certain size; we've defined cache.maxSize
for this purpose. All we have to do now is insert an if
condition within the fs.stat
callback as follows:
fs.stat(f, function (err, stats) { if (stats.size<cache.maxSize) { var bufferOffset = 0; cache.store[f] = {content: new Buffer(stats.size), timestamp: Date.now() }; s.on('data', function (data) { data.copy(cache.store[f].content, bufferOffset); bufferOffset += data.length; }); } });
Notice that we also slipped in a new timestamp
property into our cache.store[f]
method. This is for our second goal—cleaning the cache. Let's extend cache
as follows:
var cache = { store: {}, maxSize: 26214400, //(bytes) 25mb maxAge: 5400 * 1000, //(ms) 1 and a half hours clean: function(now) { var that = this; Object.keys(this.store).forEach(function (file) { if (now > that.store[file].timestamp + that.maxAge) { delete that.store[file]; } }); } };
So in addition to maxSize
, we've created a maxAge
property and added a clean
method. We call cache.clean
at the bottom of the server with the help of the following code:
//all of our code prior
cache.clean(Date.now());
}).listen(8080); //end of the http.createServer
The cache.clean
method loops through the cache.store
function and checks to see if it has exceeded its specified lifetime. If it has, we remove it from the store. One further improvement and then we're done. The cache.clean
method is called on each request. This means the cache.store
function is going to be looped through on every server hit, which is neither necessary nor efficient. It would be better if we clean the cache, say, every two hours or so. We'll add two more properties to cache—cleanAfter
to specify the time between cache cleans, and cleanedAt
to determine how long it has been since the cache was last cleaned, as follows:
var cache = { store: {}, maxSize: 26214400, //(bytes) 25mb maxAge : 5400 * 1000, //(ms) 1 and a half hours cleanAfter: 7200 * 1000,//(ms) two hours cleanedAt: 0, //to be set dynamically clean: function (now) { if (now - this.cleanAfter>this.cleanedAt) { this.cleanedAt = now; that = this; Object.keys(this.store).forEach(function (file) { if (now > that.store[file].timestamp + that.maxAge) { delete that.store[file]; } }); } } };
So we wrap our cache.clean
method in an if
statement, which will allow a loop through cache.store
only if it has been longer than two hours (or whatever cleanAfter
is set to) since the last clean.
See also
The Handling file uploads recipe discussed in Chapter 2, Exploring the HTTP Object
Chapter 2, Exploring the HTTP Object
The Securing against filesystem hacking exploits recipe
Chapter 5, Employing Streams