In this article, we will cover the following topics:
(For more resources related to this topic, see here.)
One of the great qualities of Node is its simplicity. Unlike PHP or ASP, there is no separation between the web server and code, nor do we have to customize large configuration files to get the behavior we want. With Node, we can create the web server, customize it, and deliver content. All this can be done at the code level. This article demonstrates how to create a web server with Node and feed content through it, while implementing security and performance enhancements to cater for various situations.
If we don't have Node installed yet, we can head to http://nodejs.org and hit the INSTALL button appearing on the homepage. This will download the relevant file to install Node on our operating system.
In order to deliver web content, we need to make a Uniform Resource Identifier (URI) available. This recipe walks us through the creation of an HTTP server that exposes routes to the user.
First let's create our server file. If our main purpose is to expose server functionality, it's a general practice to call the server.js file (because the npm start command runs the node server.js command by default). We could put this new server.js file in a new folder.
It's also a good idea to install and use supervisor. We use npm (the module downloading and publishing command-line application that ships with Node) to install. On the command-line utility, we write the following command:
sudo npm -g install supervisor
Essentially, sudo allows administrative privileges for Linux and Mac OS X systems. If we are using Node on Windows, we can drop the sudo part in any of our commands.
The supervisor module will conveniently autorestart our server when we save our changes. To kick things off, we can start our server.js file with the supervisor module by executing the following command:
supervisor server.js
For more on possible arguments and the configuration of supervisor, check out https://github.com/isaacs/node-supervisor.
In order to create the server, we need the HTTP module. So let's load it and use the http.createServer method as follows:
var http = require('http');
http.createServer(function (request, response) {
response.writeHead(200, {'Content-Type': 'text/html'});
response.end('Woohoo!');
}).listen(8080);
Now, if we save our file and access localhost:8080 on a web browser or using curl, our browser (or curl) will exclaim Woohoo! But the same will occur at localhost:8080/foo. Indeed, any path will render the same behavior. So let's build in some routing. We can use the path module to extract the basename variable of the path (the final part of the path) and reverse any URI encoding from the client with decodeURI as follows:
var http = require('http');
var path = require('path');
http.createServer(function (request, response) {
var lookup=path.basename(decodeURI(request.url));
We now need a way to define our routes. One option is to use an array of objects as follows:
var pages = [
{route: '', output: 'Woohoo!'},
{route: 'about', output: 'A simple routing with Node example'},
{route: 'another page', output: function() {return 'Here's
'+this.route;}},
];
Our pages array should be placed above the http.createServer call.
Within our server, we need to loop through our array and see if the lookup variable matches any of our routes. If it does, we can supply the output. We'll also implement some 404 error-related handling as follows:
http.createServer(function (request, response) {
var lookup=path.basename(decodeURI(request.url));
pages.forEach(function(page) {
if (page.route === lookup) {
response.writeHead(200, {'Content-Type': 'text/html'});
response.end(typeof page.output === 'function'
? page.output() : page.output);
}
});
if (!response.finished) {
response.writeHead(404);
response.end('Page Not Found!');
}
}).listen(8080);
The callback function we provide to http.createServer gives us all the functionality we need to interact with our server through the request and response objects. We use request to obtain the requested URL and then we acquire its basename with path. We also use decodeURI, without which another page route would fail as our code would try to match another%20page against our pages array and return false.
Once we have our basename, we can match it in any way we want. We could send it in a database query to retrieve content, use regular expressions to effectuate partial matches, or we could match it to a filename and load its contents.
We could have used a switch statement to handle routing, but our pages array has several advantages—it's easier to read, easier to extend, and can be seamlessly converted to JSON. We loop through our pages array using forEach.
Node is built on Google's V8 engine, which provides us with a number of ECMAScript 5 (ES5) features. These features can't be used in all browsers as they're not yet universally implemented, but using them in Node is no problem! The forEach function is an ES5 implementation; the ES3 way is to use the less convenient for loop.
While looping through each object, we check its route property. If we get a match, we write the 200 OK status and content-type headers, and then we end the response with the object's output property.
The response.end method allows us to pass a parameter to it, which it writes just before finishing the response. In response.end, we have used a ternary operator (?:) to conditionally call page.output as a function or simply pass it as a string. Notice that the another page route contains a function instead of a string. The function has access to its parent object through the this variable, and allows for greater flexibility in assembling the output we want to provide. In the event that there is no match in our forEach loop, response.end would never be called and therefore the client would continue to wait for a response until it times out. To avoid this, we check the response.finished property and if it's false, we write a 404 header and end the response.
The response.finished flag is affected by the forEach callback, yet it's not nested within the callback. Callback functions are mostly used for asynchronous operations, so on the surface this looks like a potential race condition; however, the forEach loop does not operate asynchronously; it blocks until all loops are complete.
There are many ways to extend and alter this example. There are also some great non-core modules available that do the legwork for us.
Our routing so far only deals with a single level path. A multilevel path (for example, /about/node) will simply return a 404 error message. We can alter our object to reflect a subdirectory-like structure, remove path, and use request.url for our routes instead of path.basename as follows:
var http=require('http');
var pages = [
{route: '/', output: 'Woohoo!'},
{route: '/about/this', output: 'Multilevel routing with Node'},
{route: '/about/node', output: 'Evented I/O for V8 JavaScript.'},
{route: '/another page', output: function () {return 'Here's '
+ this.route; }}
];
http.createServer(function (request, response) {
var lookup = decodeURI(request.url);
When serving static files, request.url must be cleaned prior to fetching a given file. Check out the Securing against filesystem hacking exploits recipe in this article.
Multilevel routing could be taken further; we could build and then traverse a more complex object as follows:
{route: 'about', childRoutes: [
{route: 'node', output: 'Evented I/O for V8 JavaScript'},
{route: 'this', output: 'Complex Multilevel Example'}
]}
After the third or fourth level, this object would become a leviathan to look at. We could alternatively create a helper function to define our routes that essentially pieces our object together for us. Alternatively, we could use one of the excellent noncore routing modules provided by the open source Node community. Excellent solutions already exist that provide helper methods to handle the increasing complexity of scalable multilevel routing.
Two other useful core modules are url and querystring. The url.parse method allows two parameters: first the URL string (in our case, this will be request.url) and second a Boolean parameter named parseQueryString. If the url.parse method is set to true, it lazy loads the querystring module (saving us the need to require it) to parse the query into an object. This makes it easy for us to interact with the query portion of a URL as shown in the following code:
var http = require('http');
var url = require('url');
var pages = [
{id: '1', route: '', output: 'Woohoo!'},
{id: '2', route: 'about', output: 'A simple routing with Node
example'},
{id: '3', route: 'another page', output: function () {
return 'Here's ' + this.route; }
},
];
http.createServer(function (request, response) {
var id = url.parse(decodeURI(request.url), true).query.id;
if (id) {
pages.forEach(function (page) {
if (page.id === id) {
response.writeHead(200, {'Content-Type': 'text/html'});
response.end(typeof page.output === 'function'
? page.output() : page.output);
}
});
}
if (!response.finished) {
response.writeHead(404);
response.end('Page Not Found');
}
}).listen(8080);
With the added id properties, we can access our object data by, for instance, localhost:8080?id=2.
There's an up-to-date list of various routing modules for Node at https://github.com/joyent/node/wiki/modules#wiki-web-frameworks-routers. These community-made routers cater to various scenarios. It's important to research the activity and maturity of a module before taking it into a production environment.
NodeZoo (http://nodezoo.com) is an excellent tool to research the state of a NODE module.
If we have information stored on disk that we want to serve as web content, we can use the fs (filesystem) module to load our content and pass it through the http.createServer callback. This is a basic conceptual starting point to serve static files; as we will learn in the following recipes, there are much more efficient solutions.
We'll need some files to serve. Let's create a directory named content, containing the following three files:
Add the following code to the HTML file index.html:
<html>
<head>
<title>Yay Node!</title>
<link rel=stylesheet href=styles.css type=text/css>
<script src=script.js type=text/javascript></script>
</head>
<body>
<span id=yay>Yay!</span>
</body>
</html>
Add the following code to the script.js JavaScript file:
window.onload = function() { alert('Yay Node!'); };
And finally, add the following code to the CSS file style.css:
#yay {font-size:5em;background:blue;color:yellow;padding:0.5em}
As in the previous recipe, we'll be using the core modules http and path. We'll also need to access the filesystem, so we'll require fs as well. With the help of the following code, let's create the server and use the path module to check if a file exists:
var http = require('http');
var path = require('path');
var fs = require('fs');
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) ||
'index.html';
var f = 'content/' + lookup;
fs.exists(f, function (exists) {
console.log(exists ? lookup + " is there"
: lookup + " doesn't exist");
});
}).listen(8080);
If we haven't already done it, then we can initialize our server.js file by running the following command:
supervisor server.js
Try loading localhost:8080/foo. The console will say foo doesn't exist, because it doesn't. The localhost:8080/script.js URL will tell us that script.js is there, because it is. Before we can serve a file, we are supposed to let the client know the content-type header, which we can determine from the file extension. So let's make a quick map using an object as follows:
var mimeTypes = {
'.js' : 'text/javascript',
'.html': 'text/html',
'.css' : 'text/css'
};
We could extend our mimeTypes map later to support more types.
Modern browsers may be able to interpret certain mime types (like text/javascript), without the server sending a content-type header, but older browsers or less common mime types will rely upon the correct content-type header being sent from the server.
Remember to place mimeTypes outside of the server callback, since we don't want to initialize the same object on every client request. If the requested file exists, we can convert our file extension into a content-type header by feeding path.extname into mimeTypes and then pass our retrieved content-type to response.writeHead. If the requested file doesn't exist, we'll write out a 404 error and end the response as follows:
//requires variables, mimeType object...
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) ||
'index.html';
var f = 'content/' + lookup;
fs.exists(f, function (exists) {
if (exists) {
fs.readFile(f, function (err, data) {
if (err) {response.writeHead(500); response.end('Server
Error!'); return; }
var headers = {'Content-type': mimeTypes[path.extname
(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
response.writeHead(404); //no such file found!
response.end();
});
}).listen(8080);
At the moment, there is still no content sent to the client. We have to get this content from our file, so we wrap the response handling in an fs.readFile method callback as follows:
//http.createServer, inside fs.exists:
if (exists) {
fs.readFile(f, function(err, data) {
var headers={'Content-type': mimeTypes[path.extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
Before we finish, let's apply some error handling to our fs.readFile callback as follows:
//requires variables, mimeType object...
//http.createServer, path exists, inside if(exists): fs.readFile(f, function(err, data) {
if (err) {response.writeHead(500); response.end('Server
Error!'); return; }
var headers = {'Content-type': mimeTypes[path.extname
(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
Notice that return stays outside of the fs.readFile callback. We are returning from the fs.exists callback to prevent further code execution (for example, sending the 404 error). Placing a return statement in an if statement is similar to using an else branch. However, the pattern of the return statement inside the if loop is encouraged instead of if else, as it eliminates a level of nesting. Nesting can be particularly prevalent in Node due to performing a lot of asynchronous tasks, which tend to use callback functions.
So, now we can navigate to localhost:8080, which will serve our index.html file. The index.html file makes calls to our script.js and styles.css files, which our server also delivers with appropriate mime types. We can see the result in the following screenshot:
This recipe serves to illustrate the fundamentals of serving static files. Remember, this is not an efficient solution! In a real world situation, we don't want to make an I/O call every time a request hits the server; this is very costly especially with larger files. In the following recipes, we'll learn better ways of serving static files.
Our script creates a server and declares a variable called lookup. We assign a value to lookup using the double pipe || (OR) operator. This defines a default route if path.basename is empty. Then we pass lookup to a new variable that we named f in order to prepend our content directory to the intended filename. Next, we run f through the fs.exists method and check the exist parameter in our callback to see if the file is there. If the file does exist, we read it asynchronously using fs.readFile. If there is a problem accessing the file, we write a 500 server error, end the response, and return from the fs.readFile callback. We can test the error-handling functionality by removing read permissions from index.html as follows:
chmod -r index.html
Doing so will cause the server to throw the 500 server error status code. To set things right again, run the following command:
chmod +r index.html
chmod is a Unix-type system-specific command. If we are using Windows, there's no need to set file permissions in this case.
As long as we can access the file, we grab the content-type header using our handy mimeTypes mapping object, write the headers, end the response with data loaded from the file, and finally return from the function. If the requested file does not exist, we bypass all this logic, write a 404 error message, and end the response.
The favicon icon file is something to watch out for. We will explore the file in this section.
When using a browser to test our server, sometimes an unexpected server hit can be observed. This is the browser requesting the default favicon.ico icon file that servers can provide. Apart from the initial confusion of seeing additional hits, this is usually not a problem. If the favicon request does begin to interfere, we can handle it as follows:
if (request.url === '/favicon.ico') {
console.log('Not found: ' + f);
response.end();
return;
}
If we wanted to be more polite to the client, we could also inform it of a 404 error by using response.writeHead(404) before issuing response.end.
Directly accessing storage on each client request is not ideal. For this task, we will explore how to enhance server efficiency by accessing the disk only on the first request, caching the data from file for that first request, and serving all further requests out of the process memory.
We are going to improve upon the code from the previous task, so we'll be working with server.js and in the content directory, with index.html, styles.css, and script.js.
Let's begin by looking at our following script from the previous recipe Serving Static Files:
var http = require('http');
var path = require('path');
var fs = require('fs');
var mimeTypes = {
'.js' : 'text/javascript',
'.html': 'text/html',
'.css' : 'text/css'
};
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) ||
'index.html';
var f = 'content/'+lookup;
fs.exists(f, function (exists) {
if (exists) {
fs.readFile(f, function(err,data) {
if (err) {
response.writeHead(500); response.end('Server Error!');
return;
}
var headers = {'Content-type': mimeTypes[path.extname
(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
response.writeHead(404); //no such file found!
response.end('Page Not Found');
});
}
We need to modify this code to only read the file once, load its contents into memory, and respond to all requests for that file from memory afterwards. To keep things simple and preserve maintainability, we'll extract our cache handling and content delivery into a separate function. So above http.createServer, and below mimeTypes, we'll add the following:
var cache = {};
function cacheAndDeliver(f, cb) {
if (!cache[f]) {
fs.readFile(f, function(err, data) {
if (!err) {
cache[f] = {content: data} ;
}
cb(err, data);
});
return;
}
console.log('loading ' + f + ' from cache');
cb(null, cache[f].content);
}
//http.createServer
A new cache object and a new function called cacheAndDeliver have been added to store our files in memory. Our function takes the same parameters as fs.readFile so we can replace fs.readFile in the http.createServer callback while leaving the rest of the code intact as follows:
//...inside http.createServer:
fs.exists(f, function (exists) {
if (exists) {
cacheAndDeliver(f, function(err, data) {
if (err) {
response.writeHead(500);
response.end('Server Error!');
return; }
var headers = {'Content-type': mimeTypes[path.extname(f)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
//rest of path exists code (404 handling)...
When we execute our server.js file and access localhost:8080 twice, consecutively, the second request causes the console to display the following output:
loading content/index.html from cache
loading content/styles.css from cache
loading content/script.js from cache
We defined a function called cacheAndDeliver, which like fs.readFile, takes a filename and callback as parameters. This is great because we can pass the exact same callback of fs.readFile to cacheAndDeliver, padding the server out with caching logic without adding any extra complexity visually to the inside of the http.createServer callback.
As it stands, the worth of abstracting our caching logic into an external function is arguable, but the more we build on the server's caching abilities, the more feasible and useful this abstraction becomes. Our cacheAndDeliver function checks to see if the requested content is already cached. If not, we call fs.readFile and load the data from disk.
Once we have this data, we may as well hold onto it, so it's placed into the cache object referenced by its file path (the f variable). The next time anyone requests the file, cacheAndDeliver will see that we have the file stored in the cache object and will issue an alternative callback containing the cached data. Notice that we fill the cache[f] property with another new object containing a content property. This makes it easier to extend the caching functionality in the future as we would just have to place extra properties into our cache[f] object and supply logic that interfaces with these properties accordingly.
If we were to modify the files we are serving, the changes wouldn't be reflected until we restart the server. We can do something about that.
To detect whether a requested file has changed since we last cached it, we must know when the file was cached and when it was last modified. To record when the file was last cached, let's extend the cache[f] object as follows:
cache[f] = {content: data,timestamp: Date.now() // store a Unix
// time stamp
};
That was easy! Now let's find out when the file was updated last. The fs.stat method returns an object as the second parameter of its callback. This object contains the same useful information as the command-line GNU (GNU's Not Unix!) coreutils stat. The fs.stat function supplies three time-related properties: last accessed (atime), last modified (mtime), and last changed (ctime). The difference between mtime and ctime is that ctime will reflect any alterations to the file, whereas mtime will only reflect alterations to the content of the file. Consequently, if we changed the permissions of a file, ctime would be updated but mtime would stay the same. We want to pay attention to permission changes as they happen so let's use the ctime property as shown in the following code:
//requires and mimeType object....
var cache = {};
function cacheAndDeliver(f, cb) {
fs.stat(f, function (err, stats) {
if (err) { return console.log('Oh no!, Error', err); }
var lastChanged = Date.parse(stats.ctime),
isUpdated = (cache[f]) && lastChanged > cache[f].timestamp;
if (!cache[f] || isUpdated) {
fs.readFile(f, function (err, data) {
console.log('loading ' + f + ' from file');
//rest of cacheAndDeliver
}); //end of fs.stat
}
If we're using Node on Windows, we may have to substitute ctime with mtime, since ctime supports at least Version 0.10.12.
The contents of cacheAndDeliver have been wrapped in an fs.stat callback, two variables have been added, and the if(!cache[f]) statement has been modified. We parse the ctime property of the second parameter dubbed stats using Date.parse to convert it to milliseconds since midnight, January 1st, 1970 (the Unix epoch) and assign it to our lastChanged variable. Then we check whether the requested file's last changed time is greater than when we cached the file (provided the file is indeed cached) and assign the result to our isUpdated variable. After that, it's merely a case of adding the isUpdated Boolean to the conditional if(!cache[f]) statement via the || (or) operator. If the file is newer than our cached version (or if it isn't yet cached), we load the file from disk into the cache object.
Caching content certainly improves upon reading a file from disk for every request. However, with fs.readFile, we are reading the whole file into memory before sending it out in a response object. For better performance, we can stream a file from disk and pipe it directly to the response object, sending data straight to the network socket a piece at a time.
We are building on our code from the last example, so let's get server.js, index.html, styles.css, and script.js ready.
We will be using fs.createReadStream to initialize a stream, which can be piped to the response object.
In this case, implementing fs.createReadStream within our cacheAndDeliver function isn't ideal because the event listeners of fs.createReadStream will need to interface with the request and response objects, which for the sake of simplicity would preferably be dealt with in the http.createServer callback. For brevity's sake, we will discard our cacheAndDeliver function and implement basic caching within the server callback as follows:
//...snip... requires, mime types, createServer, lookup and f
// vars...
fs.exists(f, function (exists) {
if (exists) {
var headers = {'Content-type': mimeTypes[path.extname(f)]};
if (cache[f]) {
response.writeHead(200, headers);
response.end(cache[f].content);
return;
} //...snip... rest of server code...
Later on, we will fill cache[f].content while we are interfacing with the readStream object. The following code shows how we use fs.createReadStream:
var s = fs.createReadStream(f);
The preceding code will return a readStream object that streams the file, which is pointed at by variable f. The readStream object emits events that we need to listen to. We can listen with addEventListener or use the shorthand on method as follows:
var s = fs.createReadStream(f).on('open', function () {
//do stuff when the readStream opens
});
Because createReadStream returns the readStream object, we can latch our event listener straight onto it using method chaining with dot notation. Each stream is only going to open once; we don't need to keep listening to it. Therefore, we can use the once method instead of on to automatically stop listening after the first event occurrence as follows:
var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
});
Before we fill out the open event callback, let's implement some error handling as follows:
var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});
The key to this whole endeavor is the stream.pipe method. This is what enables us to take our file straight from disk and stream it directly to the network socket via our response object as follows:
var s = fs.createReadStream(f).once('open', function () {
response.writeHead(200, headers);
this.pipe(response);
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});
But what about ending the response? Conveniently, stream.pipe detects when the stream has ended and calls response.end for us. There's one other event we need to listen to, for caching purposes. Within our fs.exists callback, underneath the createReadStream code block, we write the following code:
fs.stat(f, function(err, stats) {
var bufferOffset = 0;
cache[f] = {content: new Buffer(stats.size)};
s.on('data', function (chunk) {
chunk.copy(cache[f].content, bufferOffset);
bufferOffset += chunk.length;
});
}); //end of createReadStream
We've used the data event to capture the buffer as it's being streamed, and copied it into a buffer that we supplied to cache[f].content, using fs.stat to obtain the file size for the file's cache buffer.
For this case, we're using the classic mode data event instead of the readable event coupled with stream.read() (see http://nodejs.org/api/stream.html#stream_readable_read_size_1) because it best suits our aim, which is to grab data from the stream as soon as possible.
Instead of the client waiting for the server to load the entire file from disk prior to sending it to the client, we use a stream to load the file in small ordered pieces and promptly send them to the client. With larger files, this is especially useful as there is minimal delay between the file being requested and the client starting to receive the file.
We did this by using fs.createReadStream to start streaming our file from disk. The fs.createReadStream method creates a readStream object, which inherits from the EventEmitter class.
The EventEmitter class accomplishes the evented part pretty well. Due to this, we'll be using listeners instead of callbacks to control the flow of stream logic.
We then added an open event listener using the once method since we want to stop listening to the open event once it is triggered. We respond to the open event by writing the headers and using the stream.pipe method to shuffle the incoming data straight to the client. If the client becomes overwhelmed with processing, stream.pipe applies backpressure, which means that the incoming stream is paused until the backlog of data is handled.
While the response is being piped to the client, the content cache is simultaneously being filled. To achieve this, we had to create an instance of the Buffer class for our cache[f].content property.
A Buffer class must be supplied with a size (or array or string), which in our case is the size of the file. To get the size, we used the asynchronous fs.stat method and captured the size property in the callback. The data event returns a Buffer variable as its only callback parameter.
The default value of bufferSize for a stream is 64 KB; any file whose size is less than the value of the bufferSize property will only trigger one data event because the whole file will fit into the first chunk of data. But for files that are greater than the value of the bufferSize property, we have to fill our cache[f].content property one piece at a time.
Changing the default readStream buffer size
We can change the buffer size of our readStream object by passing an options object with a bufferSize property as the second parameter of fs.createReadStream.
For instance, to double the buffer, you could use fs.createReadStream(f,{bufferSize: 128 * 1024});.
We cannot simply concatenate each chunk with cache[f].content because this will coerce binary data into string format, which, though no longer in binary format, will later be interpreted as binary. Instead, we have to copy all the little binary buffer chunks into our binary cache[f].content buffer.
We created a bufferOffset variable to assist us with this. Each time we add another chunk to our cache[f].content buffer, we update our new bufferOffset property by adding the length of the chunk buffer to it. When we call the Buffer.copy method on the chunk buffer, we pass bufferOffset as the second parameter, so our cache[f].content buffer is filled correctly.
Moreover, operating with the Buffer class renders performance enhancements with larger files because it bypasses the V8 garbage-collection methods, which tend to fragment a large amount of data, thus slowing down Node's ability to process them.
While streaming has solved the problem of waiting for files to be loaded into memory before delivering them, we are nevertheless still loading files into memory via our cache object. With larger files or a large number of files, this could have potential ramifications.
Streaming allows for intelligent and minimal use of memory for processing large memory items. But even with well-written code, some apps may require significant memory.
There is a limited amount of heap memory. By default, V8's memory is set to 1400 MB on 64-bit systems and 700 MB on 32-bit systems. This can be altered by running node with --max-old-space-size=N, where N is the amount of megabytes (the actual maximum amount that it can be set to depends upon the OS, whether we're running on a 32-bit or 64-bit architecture—a 32-bit may peak out around 2 GB and of course the amount of physical RAM available).
The --max-old-space-size method doesn't apply to buffers, since it applies to the v8 heap (memory allocated for JavaScript objects and primitives) and buffers are allocated outside of the v8 heap.
If we absolutely had to be memory intensive, we could run our server on a large cloud platform, divide up the logic, and start new instances of node using the child_process class, or better still the higher level cluster module.
There are other more advanced ways to increase the usable memory, including editing and recompiling the v8 code base. The http://blog.caustik.com/2012/04/11/escape-the-1-4gb-v8-heap-limit-in-node-js link has some tips along these lines.
In this case, high memory usage isn't necessarily required and we can optimize our code to significantly reduce the potential for memory overruns. There is less benefit to caching larger files because the slight speed improvement relative to the total download time is negligible, while the cost of caching them is quite significant in ratio to our available process memory. We can also improve cache efficiency by implementing an expiration time on cache objects, which can then be used to clean the cache, consequently removing files in low demand and prioritizing high demand files for faster delivery. Let's rearrange our cache object slightly as follows:
var cache = {
store: {},
maxSize : 26214400, //(bytes) 25mb
}
For a clearer mental model, we're making a distinction between the cache object as a functioning entity and the cache object as a store (which is a part of the broader cache entity). Our first goal is to only cache files under a certain size; we've defined cache.maxSize for this purpose. All we have to do now is insert an if condition within the fs.stat callback as follows:
fs.stat(f, function (err, stats) {
if (stats.size<cache.maxSize) {
var bufferOffset = 0;
cache.store[f] = {content: new Buffer(stats.size),
timestamp: Date.now() };
s.on('data', function (data) {
data.copy(cache.store[f].content, bufferOffset);
bufferOffset += data.length;
});
}
});
Notice that we also slipped in a new timestamp property into our cache.store[f] method. This is for our second goal—cleaning the cache. Let's extend cache as follows:
var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge: 5400 * 1000, //(ms) 1 and a half hours
clean: function(now) {
var that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}
};
So in addition to maxSize, we've created a maxAge property and added a clean method. We call cache.clean at the bottom of the server with the help of the following code:
//all of our code prior
cache.clean(Date.now());
}).listen(8080); //end of the http.createServer
The cache.clean method loops through the cache.store function and checks to see if it has exceeded its specified lifetime. If it has, we remove it from the store. One further improvement and then we're done. The cache.clean method is called on each request. This means the cache.store function is going to be looped through on every server hit, which is neither necessary nor efficient. It would be better if we clean the cache, say, every two hours or so. We'll add two more properties to cache—cleanAfter to specify the time between cache cleans, and cleanedAt to determine how long it has been since the cache was last cleaned, as follows:
var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge : 5400 * 1000, //(ms) 1 and a half hours
cleanAfter: 7200 * 1000,//(ms) two hours
cleanedAt: 0, //to be set dynamically
clean: function (now) {
if (now - this.cleanAfter>this.cleanedAt) {
this.cleanedAt = now;
that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}
}
};
So we wrap our cache.clean method in an if statement, which will allow a loop through cache.store only if it has been longer than two hours (or whatever cleanAfter is set to) since the last clean.
For a Node app to be insecure, there must be something an attacker can interact with for exploitation purposes. Due to Node's minimalist approach, the onus is on the programmer to ensure that their implementation doesn't expose security flaws. This recipe will help identify some security risk anti-patterns that could occur when working with the filesystem.
We'll be working with the same content directory as we did in the previous recipes. But we'll start a new insecure_server.js file (there's a clue in the name!) from scratch to demonstrate mistaken techniques.
Our previous static file recipes tend to use path.basename to acquire a route, but this ignores intermediate paths. If we accessed localhost:8080/foo/bar/styles.css, our code would take styles.css as the basename property and deliver content/styles.css to us. How about we make a subdirectory in our content folder? Call it subcontent and move our script.js and styles.css files into it. We'd have to alter our script and link tags in index.html as follows:
<link rel=stylesheet type=text/css href=subcontent/styles.css>
<script src=subcontent/script.js type=text/javascript></script>
We can use the url module to grab the entire pathname property. So let's include the url module in our new insecure_server.js file, create our HTTP server, and use pathname to get the whole requested path as follows:
var http = require('http');
var url = require('url');
var fs = require('fs');
http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;
lookup = (lookup === "/") ? '/index.html' : lookup;
var f = 'content' + lookup;
console.log(f);
fs.readFile(f, function (err, data) {
response.end(data);
});
}).listen(8080);
If we navigate to localhost:8080, everything works great! We've gone multilevel, hooray! For demonstration purposes, a few things have been stripped out from the previous recipes (such as fs.exists); but even with them, this code presents the same security hazards if we type the following:
curl localhost:8080/../insecure_server.js
Now we have our server's code. An attacker could also access /etc/passwd with a few attempts at guessing its relative path as follows:
curl localhost:8080/../../../../../../../etc/passwd
If we're using Windows, we can download and install curl from http://curl.haxx.se/download.html.
In order to test these attacks, we have to use curl or another equivalent because modern browsers will filter these sort of requests. As a solution, what if we added a unique suffix to each file we wanted to serve and made it mandatory for the suffix to exist before the server coughs it up? That way, an attacker could request /etc/passwd or our insecure_server.js file because they wouldn't have the unique suffix. To try this, let's copy the content folder and call it content-pseudosafe, and rename our files to index.html-serve, script.js-serve, and styles.css-serve. Let's create a new server file and name it pseudosafe_server.js. Now all we have to do is make the -serve suffix mandatory as follows:
//requires section ...snip...
http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;
lookup = (lookup === "/") ? '/index.html-serve'
: lookup + '-serve';
var f = 'content-pseudosafe' + lookup;
//...snip... rest of the server code...
For feedback purposes, we'll also include some 404 handling with the help of fs.exists as follows:
//requires, create server etc
fs.exists(f, function (exists) {
if (!exists) {
response.writeHead(404);
response.end('Page Not Found!');
return;
}
//read file etc
So, let's start our pseudosafe_server.js file and try out the same exploit by executing the following command:
curl -i localhost:8080/../insecure_server.js
We've used the -i argument so that curl will output the headers. The result? A 404, because the file it's actually looking for is ../insecure_server.js-serve, which doesn't exist. So what's wrong with this method? Well it's inconvenient and prone to error. But more importantly, an attacker can still work around it! Try this by typing the following:
curl localhost:8080/../insecure_server.js%00/index.html
And voilà! There's our server code again. The solution to our problem is path.normalize, which cleans up our pathname before it gets to fs.readFile as shown in the following code:
http.createServer(function (request, response) {
var lookup = url.parse(decodeURI(request.url)).pathname;
lookup = path.normalize(lookup);
lookup = (lookup === "/") ? '/index.html' : lookup;
var f = 'content' + lookup
}
Prior recipes haven't used path.normalize and yet they're still relatively safe. The path.basename method gives us the last part of the path, thus removing any preceding double dot paths (../) that would take an attacker higher up the directory hierarchy than should be allowed.
Here we have two filesystem exploitation techniques: the relative directory traversal and poison null byte attacks. These attacks can take different forms, such as in a POST request or from an external file. They can have different effects—if we were writing to files instead of reading them, an attacker could potentially start making changes to our server. The key to security in all cases is to validate and clean any data that comes from the user. In insecure_server.js, we pass whatever the user requests to our fs.readFile method. This is foolish because it allows an attacker to take advantage of the relative path functionality in our operating system by using ../, thus gaining access to areas that should be off limits. By adding the -serve suffix, we didn't solve the problem, we put a plaster on it, which can be circumvented by the poison null byte.
The key to this attack is the %00 value, which is a URL hex code for the null byte. In this case, the null byte blinds Node to the ../insecure_server.js portion, but when the same null byte is sent through to our fs.readFile method, it has to interface with the kernel. But the kernel gets blinded to the index.html part. So our code sees index.html but the read operation sees ../insecure_server.js. This is known as null byte poisoning. To protect ourselves, we could use a regex statement to remove the ../ parts of the path. We could also check for the null byte and spit out a 400 Bad Request statement. But we don't have to, because path.normalize filters out the null byte and relative parts for us.
Let's further delve into how we can protect our servers when it comes to serving static files.
If security was an extreme priority, we could adopt a strict whitelisting approach. In this approach, we would create a manual route for each file we are willing to deliver. Anything not on our whitelist would return a 404 error. We can place a whitelist array above http.createServer as follows:
var whitelist = [
'/index.html',
'/subcontent/styles.css',
'/subcontent/script.js'
];
And inside our http.createServer callback, we'll put an if statement to check if the requested path is in the whitelist array, as follows:
if (whitelist.indexOf(lookup) === -1) {
response.writeHead(404);
response.end('Page Not Found!');
return;
}
And that's it! We can test this by placing a file non-whitelisted.html in our content directory and then executing the following command:
curl -i localhost:8080/non-whitelisted.html
This will return a 404 error because non-whitelisted.html isn't on the whitelist.
The module's wiki page (https://github.com/joyent/node/wiki/modules#wiki-web-frameworks-static) has a list of static file server modules available for different purposes. It's a good idea to ensure that a project is mature and active before relying upon it to serve your content. The node-static module is a well-developed module with built-in caching. It's also compliant with the RFC2616 HTTP standards specification, which defines how files should be delivered over HTTP. The node-static module implements all the essentials discussed in this article and more.
For the next example, we'll need the node-static module. You could install it by executing the following command:
npm install node-static
The following piece of code is slightly adapted from the node-static module's GitHub page at https://github.com/cloudhead/node-static:
var static = require('node-static');
var fileServer = new static.Server('./content');
require('http').createServer(function (request, response) {
request.addListener('end', function () {
fileServer.serve(request, response);
});
}).listen(8080);
The preceding code will interface with the node-static module to handle server-side and client-side caching, use streams to deliver content, and filter out relative requests and null bytes, among other things.
To learn more about Node.js and creating web servers, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: