Optimizing performance with streaming
Caching content certainly improves upon reading a file from disk for every request. However, with fs.readFile
, we are reading the whole file into memory before sending it out in response
. For better performance, we can stream a file from disk and pipe
it directly to the response
object, sending data straight to the network socket one piece at a time.
Getting ready
We are building on our code from the last example, so let's get server.js, index.html, styles.css
, and script.js
ready.
How to do it...
We will be using fs.createReadStream
to initialize a stream, which can be piped to the response
object. In this case, implementing fs.createReadStream
within our cacheAndDeliver
function isn't ideal because the event listeners of fs.createReadStream
will need to interface with the request
and response
objects. For the sake of simplicity, these would preferably be dealt within the http.createServer
callback. For brevity's sake, we will discard our cacheAndDeliver
function and implement basic caching within the server callback:
//requires, mime types, createServer, lookup and f vars... fs.exists(f, function (exists) { if (exists) { var headers = {'Content-type': mimeTypes[path.extname(f)]}; if (cache[f]) { response.writeHead(200, headers); response.end(cache[f].content); return; } //...rest of server code...
Later on, we will fill cache[f].content
while we're interfacing with the readStream
object. Here's how we use fs.createReadStream:
var s = fs.createReadStream(f);
This will return a readStream
object which streams the file that is pointed at by the f
variable. readStream
emits events that we need to listen to. We can listen with addEventListener
or use the shorthand on:
var s = fs.createReadStream(f).on('open', function () { //do stuff when the readStream opens });
Since createReadStream
returns the readStream
object, we can latch our event listener straight onto it using method chaining with the dot notation. Each stream is only going to open once, we don't need to keep on listening to it. Therefore, we can use the once
method instead of on
to automatically stop listening after the first event occurrence:
var s = fs.createReadStream(f).once('open', function () { //do stuff when the readStream opens });
Before we fill out the open
event callback, let's implement error handling as follows:
var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});
The key to this entire endeavor is the stream.pipe
method. This is what enables us to take our file straight from disk and stream it directly to the network socket via our response
object.
var s = fs.createReadStream(f).once('open', function () {
response.writeHead(200, headers);
this.pipe(response);
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});
What about ending the response? Conveniently, stream.pipe
detects when the stream has ended and calls response.end
for us. For caching purposes, there's one other event we need to listen to. Still within our fs.exists
callback, underneath the createReadStream
code block, we write the following code:
fs.stat(f, function(err, stats) { var bufferOffset = 0; cache[f] = {content: new Buffer(stats.size)}; s.on('data', function (chunk) { chunk.copy(cache[f].content, bufferOffset); bufferOffset += chunk.length; }); });
We've used the data
event to capture the buffer as it's being streamed, and copied it into a buffer that we supplied to cache[f].content
, using fs.stat
to obtain the file size for the file's cache buffer.
How it works...
Instead of the client waiting for the server to load the entire file from the disk prior to sending it to the client, we use a stream to load the file in small, ordered pieces and promptly send them to the client. With larger files this is especially useful, as there is minimal delay between the file being requested and the client starting to receive the file.
We did this by using fs.createReadStream
to start streaming our file from the disk. fs.createReadStream
creates readStream
, which inherits from the EventEmitter
class.
The EventEmitter
class accomplishes the evented part of Node's tag line: Evented I/O for V8 JavaScript. Due to this, we'll use listeners instead of callbacks to control the flow of stream logic.
Then we added an open
event listener using the once
method since we want to stop listening for open
once it has been triggered. We respond to the open
event by writing the headers and using the stream.pipe
method to shuffle the incoming data straight to the client.
stream.pipe
handles the data flow. If the client becomes overwhelmed with processing, it sends a signal to the server which should be honored by pausing the stream. Under the hood, stream.pipe
uses stream.pause
and stream.resume
to manage this interplay.
While the response is being piped to the client, the content cache is simultaneously being filled. To achieve this, we had to create an instance of the Buffer
class for our cache[f].content
property. A Buffer
must be supplied with a size (or an array or string) which in our case is the size of the file. To get the size, we used the asynchronous fs.stat
and captured the size
property in the callback. The data
event returns Buffer
as its only callback parameter.
The default bufferSize
for a stream is 64 KB. Any file whose size is less than the bufferSize
will only trigger one data
event because the entire file will fit into the first chunk of data. However, for files greater than bufferSize
, we have to fill our cache[f].content
property one piece at a time.
Note
Changing the default readStream
buffer size:
We can change the buffer size of readStream
by passing an options
object with a bufferSize
property as the second parameter of fs.createReadStream
.
For instance, to double the buffer you could use fs.createReadStream(f,{bufferSize: 128 * 1024})
;
We cannot simply concatenate each chunk
with cache[f].content
since this will coerce binary data into string format which, though no longer in binary format, will later be interpreted as binary. Instead, we have to copy all the little binary buffer chunks
into our binary cache[f].content
buffer.
We created a bufferOffset
variable to assist us with this. Each time we add another chunk
to our cache[f].content
buffer, we update our new bufferOffset
by adding the length of the chunk
buffer to it. When we call the Buffer.copy
method on the chunk
buffer, we pass bufferOffset
as the second parameter so our cache[f].content
buffer is filled correctly.
Moreover, operating with the Buffer
class renders performance enhancements with larger files because it bypasses the V8 garbage collection methods. These tend to fragment large amounts of data thus slowing down Node's ability to process them.
There's more...
While streaming has solved a problem of waiting for files to load into memory before delivering them, we are nevertheless still loading files into memory via our cache
object. With larger files, or large amounts of files, this could have potential ramifications.
Protecting against process memory overruns
There is a limited amount of process memory. By default, V8's memory is set to 1400 MB on 64-bit systems and 700 MB on 32-bit systems. This can be altered by running Node with --max-old-space-size=N
where N
is the amount of megabytes (the actual maximum amount that it can be set to depends upon the OS and of course the amount of physical RAM available). If we absolutely needed to be memory intensive, we could run our server on a large cloud platform, divide up the logic, and start new instances of node using the child_process
class.
In this case, high memory usage isn't necessarily required and we can optimize our code to significantly reduce the potential for memory overruns. There is less benefit to caching larger files. The slight speed improvement relative to the total download time is negligible while the cost of caching them is quite significant in ratio to our available process memory. We can also improve cache efficiency by implementing an expiration time on cache objects which can then be used to clean the cache, consequently removing files in low demand and prioritizing high-demand files for faster delivery. Let's rearrange our cache
object slightly:
var cache = { store: {}, maxSize : 26214400, //(bytes) 25mb }
For a clearer mental model, we're making a distinction between the cache as a functioning entity and the cache as a store (which is a part of the broader cache entity). Our first goal is to only cache files under a certain size. We've defined cache.maxSize
for this purpose. All we have to do now is insert an if
condition within the fs.stat
callback:
fs.stat(f, function (err, stats) { if (stats.size < cache.maxSize) { var bufferOffset = 0; cache.store[f] = {content: new Buffer(stats.size), timestamp: Date.now() }; s.on('data', function (data) { data.copy(cache.store[f].content, bufferOffset); bufferOffset += data.length; }); } });
Notice we also slipped in a new timestamp
property into our cache.store[f]
. This is for cleaning the cache, which is our second goal. Let's extend cache:
var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge: 5400 * 1000, //(ms) 1 and a half hours
clean: function(now) {
var that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}
};
So in addition to maxSize
, we've created a maxAge
property and added a clean
method. We call cache.clean
at the bottom of the server like so:
//all of our code prior
cache.clean(Date.now());
}).listen(8080); //end of the http.createServer
cache.clean
loops through cache.store
and checks to see if it has exceeded its specified lifetime. If it has, we remove it from store
. We'll add one further improvement and then we're done. cache.clean
is called on each request. This means cache.store
is going to be looped through on every server hit, which is neither necessary nor efficient. It would be better if we cleaned the cache, say, every two hours or so. We'll add two more properties to cache
. The first is cleanAfter
to specify how long between cache cleans. The second is cleanedAt
to determine how long it has been since the cache was last cleaned.
var cache = { store: {}, maxSize: 26214400, //(bytes) 25mb maxAge : 5400 * 1000, //(ms) 1 and a half hours cleanAfter: 7200 * 1000,//(ms) two hours cleanedAt: 0, //to be set dynamically clean: function (now) { if (now - this.cleanAfter > this.cleanedAt) { this.cleanedAt = now; that = this; Object.keys(this.store).forEach(function (file) { if (now > that.store[file].timestamp + that.maxAge) { delete that.store[file]; } }); } } };
We wrap our cache.clean
method in an if
statement which will allow a loop through cache.store
only if it has been longer than two hours (or whatever cleanAfter
is set to), since the last clean.
See also
Handling file uploads discussed In Chapter 2, Exploring the HTTP Object
Securing Against Filesystem Hacking Exploits discussed in this chapter.