Why should you use Node.js?
Among the many available web application development platforms, why should you chose Node.js? There are many stacks to choose from; What is it about Node.js that makes it rise above the others? We will see in the following sections.
Node.js is quickly becoming a popular development platform with adoption from plenty of big and small players. One of those is PayPal, who are replacing their incumbent Java-based system with one written in Node.js. For PayPal's blog post about this, visit https://www.paypal-engineering.com/2013/11/22/node-js-at-paypal/. Other large Node.js adopters include Walmart's online e-commerce platform, LinkedIn, and eBay.
Since we shouldn't just follow the crowd, let's look at technical reasons to adopt Node.js.
JavaScript at all levels of the stack
Having the same programming language on the server and client has been a long time dream on the web. This dream dates back to the early days of Java, where Java applets were to be the frontend to server applications written in Java, and JavaScript was originally envisioned as a lightweight scripting language for those applets. Java never fulfilled the hype and we ended up with JavaScript as the principle in-browser client-side language, rather than Java. With Node.js we may finally be able to implement applications with the same programming language on the client and server by having JavaScript at both ends of the web, in the browser and server.
A common language for frontend and backend offers several potential wins:
- The same programming staff can work on both ends of the wire
- Code can be migrated between server and client more easily
- Common data formats (JSON) exist between server and client
- Common software tools exist for server and client
- Common testing or quality reporting tools for server and client
- When writing web applications, view templates can be used on both sides
The JavaScript language is very popular due to its ubiquity in web browsers. It compares favorably against other languages while having many modern advanced language concepts. Thanks to its popularity, there is a deep talent pool of experienced JavaScript programmers out there.
Leveraging Google's investment in V8
To make Chrome a popular and excellent web browser, Google invested in making V8 a super-fast JavaScript engine. The competition to make the best web browser leads Google to keep on improving V8. As a result, Node.js programmers automatically win as each V8 iteration ratchets up performance and capabilities.
The Node.js community may change things to utilize any JavaScript engine, in case another one ends up surpassing V8.
Leaner asynchronous event-driven model
We'll get into this later. The Node.js architecture, a single execution thread and a fast JavaScript engine, has less overhead than thread-based architectures.
Microservice architecture
A new hotness in software development is the microservice idea. Node.js is an excellent platform for implementing microservices. We'll get into this later.
The Node.js is stronger for having survived a major schism and hostile fork
During 2014 and 2015, the Node.js community faced a major split over policy, direction, and control. The io.js project was a hostile fork driven by a group who wanted to incorporate several features and change who's in control of making decisions. What resulted is a merge of the Node.js and io.js repositories, an independent Node.js foundation to run the show, and the community is working together to move forward in a common direction.
Threaded versus event-driven architecture Node.js's blistering performance is said to be because of its asynchronous event-driven architecture, and its use of the V8 JavaScript engine. That's a nice thing to say, but what's the rationale for the statement?
The normal application server model uses blocking I/O to retrieve data, and it uses threads for concurrency. Blocking I/O causes threads to wait, causing a churn between threads as they are forced to wait on I/O while the application server handles requests. Threads add complexity to the application server as well as server overhead.
Node.js has a single execution thread with no waiting on I/O or context switching. Instead, there is an event loop looking for events and dispatching them to handler functions. The paradigm is that any operation that would block or otherwise take time to complete must use the asynchronous model. These functions are to be given an anonymous function to act as a handler callback, or else (with the advent of ES2015 promises), the function would return a Promise. The handler function, or promise, is invoked when the operation is complete. In the meantime, control returns to the event loop, which continues dispatching events.
To help us wrap our heads around this, Ryan Dahl, the creator of Node.js, (in his Cinco de Node presentation) asked us what happens while executing a line of code like this:
Of course, the program pauses at that point while the database layer sends the query to the database, which determines the result and returns the data. Depending on the query, that pause can be quite long. Well, a few milliseconds, which is an eon in computer time. This pause is bad because while the entire thread is idling, another request might come in and need to be handled. This is where a thread-based server architecture would need to make a thread context switch. The more outstanding connections to the server, the greater the number of thread context switches. Context switching is not free because more threads requires more memory for per-thread state and more time for the CPU to spend on thread management overhead.
Simply using an asynchronous event-driven I/O, Node.js removes most of this overhead while introducing very little of its own.
Using threads to implement concurrency often comes with admonitions like these: expensive and error-prone, the error-prone synchronization primitives of Java, or designing concurrent software can be complex and error prone. The complexity comes from the access to shared variables and various strategies to avoid deadlock and competition between threads. The synchronization primitives of Java are an example of such a strategy, and obviously many programmers find them difficult to use. There's the tendency to create frameworks such as java.util.concurrent
to tame the complexity of threaded concurrency, but some might argue that papering over complexity does not make things simpler.
Node.js asks us to think differently about concurrency. Callbacks fired asynchronously from an event loop are a much simpler concurrency model—simpler to understand, and simpler to implement.
Ryan Dahl points to the relative access time of objects to understand the need for asynchronous I/O. Objects in memory are more quickly accessed (on the order of nanoseconds) than objects on disk or objects retrieved over the network (milliseconds or seconds). The longer access time for external objects is measured in zillions of clock cycles, which can be an eternity when your customer is sitting at their web browser ready to move on if it takes longer than two seconds to load the page.
In Node.js, the query discussed previously will read as follows:
Or if written with an ES2015 Promise:
This code performs the same query written earlier. The difference is that the query result is not the result of the function call, but it is provided to a callback function that will be called later. The order of execution is not one line after another, but it is instead determined by the order of callback function execution.
Once the call to the query
function finishes, control will return almost immediately to the event loop, which goes on to servicing other requests. One of those requests will be the response to the query, which invokes the callback function.
Commonly, web pages bring together data from dozens of sources. Each one has a query and response as discussed earlier. Using asynchronous queries, each one can happen in parallel, where the page construction function can fire off dozens of queries—no waiting, each with their own callback—and then go back to the event loop, invoking the callbacks as each is done. Because it's in parallel, the data can be collected much more quickly than if these queries were done synchronously one at a time. Now, the reader on the web browser is happier because the page loads more quickly.
Performance and utilization
Some of the excitement over Node.js is due to its throughput (the requests per second it can serve). Comparative benchmarks of similar applications, for example, Apache show that Node.js has tremendous performance gains.
One benchmark going around is this simple HTTP server (borrowed from https://nodejs.org/en/), which simply returns a "Hello World" message directly from memory:
This is one of the simpler web servers one can build with Node.js. The http
object encapsulates the HTTP protocol, and its http.createServer
method creates a whole web server, listening on the port specified in the listen
method. Every request (whether a GET
or POST
on any URL) on that web server calls the provided function. It is very simple and lightweight. In this case, regardless of the URL, it returns a simple text/plain
Hello World response.
Ryan Dahl (Node.js's original author) showed a simple benchmark (http://nodejs.org/cinco_de_node.pdf) that returned a 1-megabyte binary buffer; Node.js gave 822 req/sec while Nginx gave 708 req/sec, for a 15% improvement over Nginx. He also noted that Nginx peaked at 4 megabytes memory, while Node.js peaked at 64 megabytes.
Yahoo! search engineer Fabian Frank published a performance case study of a real-world search query suggestion widget implemented with Apache/PHP and two variants of Node.js stacks (http://www.slideshare.net/FabianFrankDe/nodejs-performance-case-study). The application is a pop-up panel showing search suggestions as the user types in phrases, using a JSON-based HTTP query. The Node.js version could handle eight times the number of requests per second with the same request latency. Fabian Frank said both Node.js stacks scaled linearly until CPU usage hit 100%. In another presentation (http://www.slideshare.net/FabianFrankDe/yahoo-scale-nodejs), he discussed how Yahoo!Axis is running on Manhattan + Mojito and the value of being able to use the same language (JavaScript) and framework (YUI/YQL) on both frontend and backend.
LinkedIn did a massive overhaul of their mobile app using Node.js for the server-side to replace an old Ruby on Rails app. The switch let them move from 30 servers down to three, and allowed them to merge the frontend and backend team because everything was written in JavaScript. Before choosing Node.js, they'd evaluated Rails with Event Machine, Python with Twisted, and Node.js, choosing Node.js for the reasons that we just discussed. For a look at what LinkedIn did, see http://arstechnica.com/information-technology/2012/10/a-behind-the-scenes-look-at-linkedins-mobile-engineering/.
Mikito Takada blogged about benchmarking and performance improvements in a 48 hour hackathon application (http://blog.mixu.net/2011/01/17/performance-benchmarking-the-node-js-backend-of-our-48h-product-wehearvoices-net/) he built comparing Node.js with what he claims is a similar application written with Django (a web application framework for Python). The unoptimized Node.js version is quite a bit slower (in response time) than the Django version but a few optimizations (MySQL connection pooling, caching, and so on) made drastic performance improvements handily beating out Django.
Is Node.js a cancerous scalability disaster?
In October 2011, software developer and blogger Ted Dziuba wrote an infamous blog post (since pulled from his blog) claiming that Node.js is a cancer, calling it a "scalability disaster". The example he showed for proof is a CPU-bound implementation of the Fibonacci sequence algorithm. While his argument was flawed, he raised a valid point that Node.js application developers have to consider—where do you put the heavy computational tasks?
A key to maintaining high throughput of Node.js applications is ensuring that events are handled quickly. Because it uses a single execution thread, if that thread is bogged down with a big calculation, it cannot handle events, and the system performance will suffer.
The Fibonacci sequence, serving as a stand-in for heavy computational tasks, quickly becomes computationally expensive to calculate, especially for a naïve implementation like this:
Yes, there are many ways to calculate Fibonacci numbers more quickly. We are showing this as a general example of what happens to Node.js when event handlers are slow, and not to debate the best ways to calculate mathematics functions:
If you call this from the request handler in a Node.js HTTP server, for sufficiently large values of n
(for example, 40), the server becomes completely unresponsive because the event loop is not running, as this function is grinding through the calculation.
Does this mean that Node.js is a flawed platform? No, it just means that the programmer must take care to identify code with long-running computations and develop a solution. The possible solutions include rewriting the algorithm to work with the event loop or to foist computationally expensive calculations to a backend server.
A simple rewrite dispatches the computations through the event loop, letting the server continue handling requests on the event loop. Using callbacks and closures (anonymous functions), we're able to maintain asynchronous I/O and concurrency promises:
Dziuba's valid point wasn't expressed well in his blog post, and it was somewhat lost in the flames following that post. Namely, that while Node.js is a great platform for I/O-bound applications, it isn't a good platform for computationally intensive ones.
Server utilization, the bottom line, and green web hosting
The striving for optimal efficiency (handling more requests per second) is not just about the geeky satisfaction that comes from optimization. There are real business and environmental benefits. Handling more requests per second, as Node.js servers can do, means the difference between buying lots of servers and buying only a few servers. Node.js can let your organization do more with less.
Roughly speaking, the more servers you buy, the greater the cost, and the greater the environmental impact. There's a whole field of expertise around reducing cost and the environmental impact of running web server facilities, to which that rough guideline doesn't do justice. The goal is fairly obvious—fewer servers, lower costs, and lower environmental impact.
Intel's paper, Increasing Data Center Efficiency with Server Power Measurements (http://download.intelintel.com/it/pdf/Server_Power_Measurement_final.pdf), gives an objective framework for understanding efficiency and data center costs. There are many factors such as buildings, cooling systems, and computer system designs. Efficient building design, efficient cooling systems, and efficient computer systems (datacenter efficiency, datacenter density, and storage density) can decrease costs and environmental impact. But you can destroy those gains by deploying an inefficient software stack compelling you to buy more servers than you would if you had an efficient software stack. Alternatively, you can amplify gains from datacenter efficiency with an efficient software stack.
This talk about efficient software stacks isn't just for altruistic environmental purposes. This is one of those cases where being green can help your business bottom line.
Node.js, the microservice architecture, and easily testable systems
New capabilities such as cloud deployment systems and Docker make it possible to implement a new kind of service architecture. Docker makes it possible to define server process configuration in a repeatable container that's easy to deploy by the millions into a cloud hosting system. It lends itself best to small single-purpose service instances that can be connected together to make a complete system. Docker isn't the only tool to help simplify cloud deployments; however, its features are well attuned to modern application deployment needs.
Some have popularized the microservice concept as a way to describe this kind of system. According to the microservices.io website, a microservice consists of a set of narrowly focused, independently deployable services. They contrast this with the monolithic application deployment pattern where every aspect of the system is integrated into one bundle (such as a single WAR
file for a Java EE appserver). The microservice model gives developers much needed flexibility.
Some advantages of microservices are as follows:
- Each microservice can be managed by a small team
- Each team can work on its own schedule, so long as the service API compatibility is maintained
- Microservices can be deployed independently, such as for easier testing
- It's easier to switch technology stack choices
Where does Node.js fit with this? Its design fits the microservice model like a glove:
- Node.js encourages small, tightly focused, single purpose modules
- These modules are composed into an application by the excellent npm package management system
- Publishing modules is incredibly simple, whether via the NPM repository or a Git URL
Node.js and the Twelve-Factor app model
Throughout this book, we'll call out aspects of the Twelve-Factor application model, and ways to implement those ideas in Node.js. This model is published on http://12factor.net, and it is a set of guidelines for application deployment in the modern cloud computing era.
The guidelines are straightforward, and once you read them, they seem like pure common sense. As a best practice, the Twelve-Factor model is a compelling strategy for delivering the kind of fluid self-contained cloud deployed applications called for by our current computing environment.