The implications of Node's design on system architects
Node is a new technology. At the time of writing this, it has yet to reach its 1.0 version. Security flaws have been found and fixed. Memory leaks have been found and fixed. Eran Hammer, mentioned at the beginning of this chapter, and his entire team at Walmart Labs actively contribute to the Node codebase—in particular when they find flaws! This is true of many other large companies committed to Node, such as PayPal.
If you have chosen Node, and your application has grown to such a size that you feel you need to read a book on how to deploy Node, you have the opportunity to not only benefit from the community, but have a part, perhaps, in literally designing aspects of the environment based on your particular needs. Node is open source, and you can submit pull requests.
In addition to events, there are two key design aspects that are important to understand if you are going to do advanced Node work: build your systems out of small parts and use evented streams when piping data between them.
Building large systems out of small systems
In his book, The Art of Unix Programming, Eric Raymond proposed the Rule of Modularity:
"Developers should build a program out of simple parts connected by well-defined interfaces, so problems are local, and parts of the program can be replaced in future versions to support new features. This rule aims to save time on debugging complex code that is complex, long, and unreadable."
This idea of building complex systems out of "small pieces, loosely joined" is seen in management theory, theories of government, manufacturing, and many other contexts. In terms of software development, it advises developers to contribute only the simplest, most useful component necessary within a larger system. Large systems are hard to reason about, especially if the boundaries of their components are fuzzy.
One of the primary difficulties when constructing scalable JavaScript programs is the lack of a standard interface to assemble a coherent program out of many smaller ones. For example, a typical web application might load dependencies using a sequence of <script>
tags in the <head>
section of a HyperText Markup Language (HTML) document:
<head> <script src="fileA.js"></script> <script src="fileB.js"></script> </head>
There are many problems with this sort of system:
- All potential dependencies must be declared prior to their being needed—dynamic inclusion requires complicated hacks.
- The introduced scripts are not forcibly encapsulated—nothing stops both files from writing to the same global object. Namespaces can easily collide, which makes arbitrary injection dangerous.
fileA
cannot addressfileB
as a collection—an addressable context, such asfileB.method
, isn't available.- The
<script>
method itself isn't systematic, precluding the design of useful module services, such as dependency awareness and version control. - Scripts cannot be easily removed or overridden.
- Because of these dangers and difficulties, sharing is not effortless, thus diminishing opportunities for collaboration in an open ecosystem.
Ambivalently inserting unpredictable code fragments into an application frustrates attempts to predictably shape functionality. What is needed is a standard way to load and share discreet program modules.
Accordingly, Node introduced the concept of the package, following the CommonJS specification. A package is a collection of program files bundled with a manifest file describing the collection. Dependencies, authorship, purpose, structure, and other important metadata is exposed in a standard way. This encourages the construction of large systems from many small, interdependent systems. Perhaps, even more importantly, it encourages sharing:
"What I'm describing here is not a technical problem. It's a matter of people getting together and making a decision to step forward and start building up something bigger and cooler together." | ||
--Kevin Dangoor, creator of CommonJS |
In many ways, the success of Node is due to the growth in the number and quality of packages available to the developer community that are distributed via Node's package management system, npm. This system has done much to help make JavaScript a viable, professional option for systems programming.
Note
A good introduction to npm for anyone new to Node can be found at: https://www.npmjs.org/doc/developers.html.
Streams
In his book, The C++ Programming Language, Third Edition, Bjarne Stoustrup states:
"Designing and implementing a general input/output facility for a programming language is notoriously difficult. […] An I/O facility should be easy, convenient, and safe to use; efficient and flexible; and, above all, complete."
It shouldn't surprise anyone that a design team focused on providing efficient and easy I/O has delivered such a facility through Node. Through a symmetrical and simple interface, which handles data buffers and stream events so that the implementer does not have to, Node's Stream
module is the preferred way to manage asynchronous data streams for both internal modules and, hopefully, for the modules that developers will create.
Note
An excellent tutorial on the Stream
module can be found at https://github.com/substack/stream-handbook. Also, the Node documentation is comprehensive at http://nodejs.org/api/stream.html.
A stream in Node is simply a sequence of bytes or, if you like, a sequence of characters. At any time, a stream contains a buffer of bytes, and this buffer has a length of zero or more.
Because each character in a stream is well defined, and because every type of digital data can be expressed in bytes, any part of a stream can be redirected, or piped, to any other stream, different chunks of the stream can be sent to different handlers. In this way, stream input and output interfaces are both flexible and predictable and can be easily coupled.
In addition to events, Node is distinctive for its comprehensive use of streams. Continuing the idea of composing applications out of many small processes emitting events or reacting to events, several Node I/O modules and features are implemented as streams. Network sockets, file readers and writers, stdin and stdout, Zlib, and so on, are all data producers and/or consumers that are easily connected through the abstract Stream
interface. Those familiar with Unix pipes will see some similarities.
Five distinct base classes are exposed via the abstract Stream
interface: Readable
, Writable
, Duplex
, Transform
, and PassThrough
. Each base class inherits from EventEmitter
, which we know to be an interface to which event listeners and emitters can be bound. Streams in Node are evented streams, and sending data between processes is commonly done using streams. Because streams can be easily chained and otherwise combined, they are fundamental tools for the Node developer.
It is recommended that you develop a clear understanding of what streams are and how they are implemented in Node before going further as we will use streams extensively throughout this book.