Extending JavaScript
When he designed Node, JavaScript was not Ryan Dahl's original language choice. Yet, after exploring it, he found a very good modern language without opinions on streams, the filesystem, handling binary objects, processes, networking, and other capabilities one would expect to exist in a system's programming language. JavaScript, strictly limited to the browser, had no use for, and had not implemented, these features.
Dahl was guided by a few rigid principles:
- A Node program/process runs on a single thread, ordering execution through an event loop
- Web applications are I/O intensive, so the focus should be on making I/O fast
- Program flow is always directed through asynchronous callbacks
- Expensive CPU operations should be split off into separate parallel processes, emitting events as results arrive
- Complex programs should be assembled from simpler programs
The general principle is, operations must never block. Node's desire for speed (high concurrency) and efficiency (minimal resource usage) demands the reduction of waste. A waiting process is a wasteful process, especially when waiting for I/O.
JavaScript's asynchronous, event-driven design fits neatly into this model. Applications express interest in some future event and are notified when that event occurs. This common JavaScript pattern should be familiar to you:
Window.onload = function() { // When all requested document resources are loaded, // do something with the resulting environment } element.onclick = function() { // Do something when the user clicks on this element }
The time it will take for an I/O action to complete is unknown, so the pattern is to ask for notification when an I/O event is emitted, whenever that may be, allowing other operations to be completed in the meantime.
Node adds an enormous amount of new functionality to JavaScript. Primarily, the additions provide evented I/O libraries offering the developer system access not available to browser-based JavaScript, such as writing to the filesystem or opening another system process. Additionally, the environment is designed to be modular, allowing complex programs to be assembled out of smaller and simpler components.
Let's look at how Node imported JavaScript's event model, extended it, and used it in the creation of interfaces to powerful system commands.
Events
Many of the JavaScript extensions in Node emit events. These events are instances of events.EventEmitter
. Any object can extend EventEmitter
, providing the developer with an elegant toolkit for building tight asynchronous interfaces to object methods.
Work through this example demonstrating how to set an EventEmitter
object as the prototype of a function constructor. As each constructed instance now has the EventEmitter
object exposed to its prototype chain, this
provides a natural reference to the event API (Application Programming Interface). The counter
instance methods can therefore emit events, and these can be listened for. Here we emit the latest count whenever the counter.increment
method is called, and bind a callback to the incremented event, which simply prints the current counter value to the command line:
var EventEmitter = require('events').EventEmitter; var Counter = function(init) { this.increment = function() { init++; this.emit('incremented', init); } } Counter.prototype = new EventEmitter(); var counter = new Counter(10); var callback = function(count) { console.log(count); } counter.addListener('incremented', callback); counter.increment(); // 11 counter.increment(); // 12
To remove the event listeners bound to counter
, use counter.removeListener('incremented', callback)
. For consistency with browser-based JavaScript, counter.on
and counter.addListener
are interchangeable.
The addition of EventEmitter
as an extensible object greatly increases the possibilities of JavaScript on the server. In particular, it allows I/O data streams to be handled in an event-oriented manner, in keeping with the Node's principle of asynchronous, non-blocking programming:
var Readable = require('stream').Readable; var readable = new Readable; var count = 0; readable._read = function() { if(++count > 10) { return readable.push(null); } setTimeout(function() { readable.push(count + "\n"); }, 500); }; readable.pipe(process.stdout);
In this program we are creating a Readable
stream and piping any data pushed into this stream to process.stdout
. Every 500 milliseconds we increment a counter and push
that number (adding a newline) onto the stream, resulting in an incrementing series of numbers being written to the terminal. When our series has reached its limit (10), we push null
onto the stream, causing it to terminate. Don't worry if you don't fully understand how Readable
is implemented here—streams will be fully explained in the following chapters. Simply note how the act of pushing data onto a stream causes a corresponding event to fire, how the developer can assign a custom callback to handle this event, and how newly added data can be redirected to other streams. Node is designed such that I/O operations are consistently implemented as asynchronous, evented data streams.
It is also important to note the importance of this style of I/O. Because Node's event loop need only commit resources to handling callbacks, many other instructions can be processed in the down time between each interval.
As an exercise, re-implement the previous code snippet such that the emitted data is piped to a file. You'll need to use fs.createWriteStream
:
var fs = require('fs'); var writeStream = fs.createWriteStream("./counter", { flags : 'w', mode: 0666 });
Modularity
In his book The Art of Unix Programming, Eric Raymond proposed the Rule of Modularity:
Developers should build a program out of simple parts connected by well defined interfaces, so problems are local, and parts of the program can be replaced in future versions to support new features. This rule aims to save time on debugging complex code that is complex, long, and unreadable.
This idea of building complex systems out of small pieces, loosely joined is seen in the management theory, theories of government, physical manufacturing, and many other contexts. In terms of software development, it advises developers to contribute only the simplest and most useful component necessary within a larger system. Large systems are hard to reason about, especially if the boundaries of its components are fuzzy.
One of the primary difficulties when constructing scalable JavaScript programs is the lack of a standard interface for assembling a coherent program out of many smaller ones. For example, a typical web application might load dependencies using a sequence of <script>
tags in the <head>
section of an HTML document:
<head> <script src="fileA.js"></script> <script src="fileB.js"></script> </head>
There are many problems with this sort of solution:
- All potential dependencies must be declared prior to being needed—dynamic inclusion requires complicated hacks.
- The introduced scripts are not forcibly encapsulated—nothing stops code in both files from writing to the same global object. Namespaces can easily collide, making arbitrary injection dangerous.
fileA
cannot addressfileB
as a collection—an addressable context such asfileB.method
isn't available.- The
<script>
method itself isn't systematic, precluding the design of useful module services, such as dependency awareness or version control. - Scripts cannot be easily removed, or overridden.
- Because of these dangers and difficulties, sharing is not effortless, diminishing opportunities for collaboration in an open ecosystem.
Ambivalently inserting unpredictable code fragments into an application frustrates attempts to predictably shape functionality. What is needed is a standard way to load and share discreet program modules.
Accordingly, Node introduced the concept of the package, following the CommonJS specification. A package is a collection of program files bundled with a manifest file describing the collection. Dependencies, authorship, purpose, structure, and other important meta-data are exposed in a standard way. This encourages the construction of large systems from many small, interdependent systems. Perhaps even more importantly, it encourages sharing:
What I'm describing here is not a technical problem. It's a matter of people getting together and making a decision to step forward and start building up something bigger and cooler together. | ||
--Kevin Dangoor, creator of CommonJS |
In many ways the success of Node is due to growth in the number and quality of packages available to the developer community, distributed via Node's package management system, npm. The design choices of this system, both social and technical, have done much to help make JavaScript a viable professional option for systems programming.
More extensive information on creating and managing Node packages can be found in Appendix A, Organizing Your Work. The key point is this: build programs out of packages where possible, and share those packages when possible. The shape of your applications will be clearer and easier to maintain. Importantly, the efforts of thousands of other developers can be linked into applications via npm, directly by inclusion, and indirectly as shared packages are tested, improved, refactored, and repurposed by members of the Node community.
Note
Contrary to popular belief, npm is not an abbreviation for Node Package Manager (or even an acronym):
https://npmjs.org/doc/faq.html#If-npm-is-an-acronym-why-is-it-never-capitalized
The Network
I/O in the browser is mercilessly hobbled, for very good reasons—if the JavaScript on any given website could access your filesystem, or open up network connections to any server, the WWW would be a less fun place.
For Node, I/O is of fundamental importance, and its focus from the start was to simplify the creation of scalable systems with high I/O requirements. It is likely that your first experience with Node was in writing an HTTP server.
Node supports several standard network protocols in addition to HTTP, such as TLS/SSL (Transport Layer Security/Secure Sockets Layer), and UDP (User Datagram Protocol). With these tools we can easily build scalable network programs, moving well beyond the somewhat dated AJAX (Asynchronous JavaScript And Xml) techniques familiar to the JavaScript developer.
Let's create a simple program that allows the user to send data between two UDP servers:
var dgram = require('dgram'); var client = dgram.createSocket("udp4"); var server = dgram.createSocket("udp4"); var message = process.argv[2] || "message"; message = new Buffer(message); server .on("message", function (msg) { process.stdout.write("Got message: " + msg + "\n"); process.exit(); }) .bind(41234); client.send(message, 0, message.length, 41234, "localhost");
Assuming a program file name of udp.js
a message can be sent via UDP by running this program from the terminal like so:
node udp.js "my message"
Which will result in the following output:
Got message: my message
We first establish our UDP servers, one working as a broadcaster, the other as a listener. process.argv
contains useful command information, including command-line arguments commencing at index(2), which in this case would contain "my message"
. UDP requires messages to be Buffer
objects, so we ensure that some message exists and convert it.
A UDP server is an instance of EventEmitter
, emitting a message event when messages are received on the port it is bound. This server simply echoes the received message. All that is left to do is send the message, which action is performed by the client, passing along our message to port #41234
.
Moving streams of data around the I/O layer of your application is simplified within Node. It isn't difficult to share data streams across differing protocol servers, as data streams are standardized via Node's interfaces. Protocol details are handled for you.
Let's continue to explore I/O, the process
object, and events. First, let's dig into the machine powering Node's core.