In this article, David Mark Clements, the author of the book, Node.js Cookbook, we will be covering the following points to introduce you to using Node.js for exploratory data analysis:
(For more resources related to this topic, see here.)
In idiomatic Node, the module is the fundamental unit of logic. Any typical application or system consists of generic code and application code. As a best practice, generic shareable code should be held in discrete modules, which can be composed together at the application level with minimal amounts of domain-specific logic. In this article, we'll learn how Node's module system works, how to create modules for various scenarios, and how we can reuse and share our code.
Let's begin our exploration by setting up a typical file and directory structure for a Node module. At the same time, we'll be learning how to automatically generate a package.json file (we refer to this throughout as initializing a folder as a package) and to configure npm (Node's package managing tool) with some defaults, which can then be used as part of the package generation process.
In this recipe, we'll create the initial scaffolding for a full Node module.
Installing Node
If we don't already have Node installed, we can go to https://nodejs.org to pick up the latest version for our operating system.
If Node is on our system, then so is the npm executable; npm is the default package manager for Node. It's useful for creating, managing, installing, and publishing modules.
Before we run any commands, let's tweak the npm configuration a little:
npm config set init.author.name "<name here>"
This will speed up module creation and ensure that each package we create has a consistent author name, thus avoiding typos and variations of our name.
npm stands for...
Contrary to popular belief, npm is not an acronym for Node Package Manager; in fact, it stands for npm is Not An Acronym, which is why it's not called NINAA.
Let's say we want to create a module that converts HSL (hue, saturation, luminosity) values into a hex-based RGB representation, such as will be used in CSS (for example, #fb4a45 ).
The name hsl-to-hex seems good, so let's make a new folder for our module and cd into it:
mkdir hsl-to-hex
cd hsl-to-hex
Every Node module must have a package.json file, which holds metadata about the module.
Instead of manually creating a package.json file, we can simply execute the following command in our newly created module folder:
npm init
This will ask a series of questions. We can hit enter for every question without supplying an answer. Note how the default module name corresponds to the current working directory, and the default author is the init.author.name value we set earlier.
An npm init should look like this:
Upon completion, we should have a package.json file that looks something like the following:
{
"name": "hsl-to-hex",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo "Error: no test specified" && exit 1"
},
"author": "David Mark Clements",
"license": "MIT"
}
When Node is installed on our system, npm comes bundled with it.
The npm executable is written in JavaScript and runs on Node.
The npm config command can be used to permanently alter settings. In our case, we changed the init.author.name setting so that npm init would reference it for the default during a module's initialization.
We can list all the current configuration settings with npm config ls .
Config Docs
Refer to https://docs.npmjs.com/misc/config for all possible npm configuration settings.
When we run npm init, the answers to prompts are stored in an object, serialized as JSON and then saved to a newly created package.json file in the current directory.
Let's find out some more ways to automatically manage the content of the package.json file via the npm command.
Sometimes additional metadata can be available after we've created a module. A typical scenario can arise when we initialize our module as a git repository and add a remote endpoint after creating the module.
Git and GitHub
If we've not used the git tool and GitHub before, we can refer to http://help.github.com to get started.
If we don't have a GitHub account, we can head to http://github.com to get a free account.
To demonstrate, let's create a GitHub repository for our module. Head to GitHub and click on the plus symbol in the top-right, then select New repository:
Select New repository. Specify the name as hsl-to-hex and click on Create Repository.
Back in the Terminal, inside our module folder, we can now run this:
echo -e "node_modulesn*.log" > .gitignore
git init
git add .
git commit -m '1st'
git remote add origin http://github.com/<username>/hsl-to-hex
git push -u origin master
Now here comes the magic part; let's initialize again (simply press enter for every question):
npm init
This time the Git remote we just added was detected and became the default answer for the git repository question. Accepting this default answer meant that the repository, bugs, and homepage fields were added to package.json .
A repository field in package.json is an important addition when it comes to publishing open source modules since it will be rendered as a link on the modules information page at http://npmjs.com.
A repository link enables potential users to peruse the code prior to installation. Modules that can't be viewed before use are far less likely to be considered viable.
The npm tool supplies other functionalities to help with module creation and management workflow.
For instance, the npm version command can allow us to manage our module's version number according to SemVer semantics.
SemVer
SemVer is a versioning standard. A version consists of three numbers separated by a dot, for example, 2.4.16. The position of a number denotes specific information about the version in comparison to the other versions. The three positions are known as MAJOR.MINOR.PATCH. The PATCH number is increased when changes have been made that don't break the existing functionality or add any new functionality. For instance, a bug fix will be considered a patch. The MINOR number should be increased when new backward compatible functionality is added. For instance, the adding of a method. The MAJOR number increases when backwards-incompatible changes are made. Refer to http://semver.org/ for more information.
If we were to a fix a bug, we would want to increase the PATCH number. We can either manually edit the version field in package.json , setting it to 1.0.1, or we can execute the following:
npm version patch
This will increase the version field in one command. Additionally, if our module is a Git repository, it will add a commit based on the version (in our case, v1.0.1), which we can then immediately push.
When we ran the command, npm output the new version number. However, we can double-check the version number of our module without opening package.json:
npm version
This will output something similar to the following:
{ 'hsl-to-hex': '1.0.1',
npm: '2.14.17',
ares: '1.10.1-DEV',
http_parser: '2.6.2',
icu: '56.1',
modules: '47',
node: '5.7.0',
openssl: '1.0.2f',
uv: '1.8.0',
v8: '4.6.85.31',
zlib: '1.2.8' }
The first field is our module along with its version number.
If we added a new backwards-compatible functionality, we can run this:
npm version minor
Now our version is 1.1.0.
Finally, we can run the following for a major version bump:
npm version major
This sets our modules version to 2.0.0.
Since we're just experimenting and didn't make any changes, we should set our version back to 1.0.0.
We can do this via the npm command as well:
npm version 1.0.0
Refer to the following recipes:
In most cases, it's most wise to compose a module out of other modules.
In this recipe, we will install a dependency.
For this recipe, all we need is Command Prompt open in the hsl-to-hex folder from the Scaffolding a module recipe.
Our hsl-to-hex module can be implemented in two steps:
Before we tear into writing an HSL to the RGB algorithm, we should check whether this problem has already been solved.
The easiest way to check is to head to http://npmjs.com and perform a search:
After some research, we decide that the hsl-to-rgb-for-reals module is the best fit.
Ensuring that we are in the hsl-to-hex folder, we can now install our dependency with the following:
npm install --save hsl-to-rgb-for-reals
Now let's take a look at the bottom of package.json:
tail package.json #linux/osx
type package.json #windows
Tail output should give us this:
"bugs": {
"url": "https://github.com/davidmarkclements/hsl-to-hex/issues"
},
"homepage": "https://github.com/davidmarkclements/hsl-to-hex#readme",
"description": "",
"dependencies": {
"hsl-to-rgb-for-reals": "^1.1.0"
}
}
We can see that the dependency we installed has been added to a dependencies object in the package.json file.
The top two results of the npm search are hsl-to-rgb and hsl-to-rgb-for-reals . The first result is unusable because the author of the package forgot to export it and is unresponsive to fixing it. The hsl-to-rgb-for-reals module is a fixed version of hsl-to-rgb .
This situation serves to illustrate the nature of the npm ecosystem.
On the one hand, there are over 200,000 modules and counting, and on the other many of these modules are of low value. Nevertheless, the system is also self-healing in that if a module is broken and not fixed by the original maintainer, a second developer often assumes responsibility and publishes a fixed version of the module.
When we run npm install in a folder with a package.json file, a node_modules folder is created (if it doesn't already exist). Then, the package is downloaded from the npm registry and saved into a subdirectory of node_modules (for example, node_modules/hsl-to-rgb-for-reals ).
npm 2 vs npm 3
Our installed module doesn't have any dependencies of its own. However, if it did, the sub-dependencies would be installed differently depending on whether we're using version 2 or version 3 of npm.
Essentially, npm 2 installs dependencies in a tree structure, for instance, node_modules/dep/node_modules/sub-dep-of-dep/node_modules/sub-dep-of-sub-dep. Conversely, npm 3 follows a maximally flat strategy where sub-dependencies are installed in the top level node_modules folder when possible, for example, node_modules/dep, node_modules/sub-dep-of-dep, and node_modules/sub-dep-of-sub-dep. This results in fewer downloads and less disk space usage; npm 3 resorts to a tree structure in cases where there are two versions of a sub-dependency, which is why it's called a maximally flat strategy.
Typically, if we've installed Node 4 or above, we'll be using npm version 3.
Let's explore development dependencies, creating module management scripts and installing global modules without requiring root access.
We usually need some tooling to assist with development and maintenance of a module or application. The ecosystem is full of programming support modules, from linting to testing to browser bundling to transpilation.
In general, we don't want consumers of our module to download dependencies they don't need. Similarly, if we're deploying a system built-in node, we don't want to burden the continuous integration and deployment processes with superfluous, pointless work.
So, we separate our dependencies into production and development categories.
When we use npm --save install <dep>, we're installing a production module.
To install a development dependency, we use --save-dev.
Let's go ahead and install a linter.
JavaScript Standard Style
A standard is a JavaScript linter that enforces an unconfigurable ruleset. The premise of this approach is that we should stop using precious time up on bikeshedding about syntax.
All the code in this article uses the standard linter, so we'll install that:
npm install --save-dev standard
semistandard
If the absence of semicolons is abhorrent, we can choose to install semistandard instead of standard at this point. The lint rules match those of standard, with the obvious exception of requiring semicolons. Further, any code written using standard can be reformatted to semistandard using the semistandard-format command tool. Simply, run npm -g i semistandard-format to get started with it.
Now, let's take a look at the package.json file:
{
"name": "hsl-to-hex",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"test": "echo "Error: no test specified" && exit 1"
},
"author": "David Mark Clements",
"license": "MIT",
"repository": {
"type": "git",
"url": "git+ssh://git@github.com/davidmarkclements/hsl-to-hex.git"
},
"bugs": {
"url": "https://github.com/davidmarkclements/hsl-to-hex/issues"
},
"homepage": "https://github.com/davidmarkclements/hsl-to-
hex#readme",
"description": "",
"dependencies": {
"hsl-to-rgb-for-reals": "^1.1.0"
},
"devDependencies": {
"standard": "^6.0.8"
}
}
We now have a devDependencies field alongside the dependencies field.
When our module is installed as a sub-dependency of another package, only the hsl-to-rgb-for-reals module will be installed while the standard module will be ignored since it's irrelevant to our module's actual implementation.
If this package.json file represented a production system, we could run the install step with the --production flag, as shown:
npm install --production
Alternatively, this can be set in the production environment with the following command:
npm config set production true
Currently, we can run our linter using the executable installed in the node_modules/.bin folder. Consider this example:
./node_modules/.bin/standard
This is ugly and not at all ideal. Refer to Using npm run scripts for a more elegant approach.
Our package.json file currently has a scripts property that looks like this:
"scripts": {
"test": "echo "Error: no test specified" && exit 1"
},
Let's edit the package.json file and add another field, called lint, as follows:
"scripts": {
"test": "echo "Error: no test specified" && exit 1",
"lint": "standard"
},
Now, as long as we have standard installed as a development dependency of our module (refer to Installing Development Dependencies), we can run the following command to run a lint check on our code:
npm run-script lint
This can be shortened to the following:
npm run lint
When we run an npm script, the current directory's node_modules/.bin folder is appended to the execution context's PATH environment variable. This means even if we don't have the standard executable in our usual system PATH, we can reference it in an npm script as if it was in our PATH.
Some consider lint checks to be a precursor to tests.
Let's alter the scripts.test field, as illustrated:
"scripts": {
"test": "npm run lint",
"lint": "standard"
},
Chaining commands
Later, we can append other commands to the test script using the double ampersand (&&) to run a chain of checks. For instance, "test": "npm run lint && tap test".
Now, let's run the test script:
npm run test
Since the test script is special, we can simply run this:
npm test
The npm executable can install both the local and global modules. Global modules are mostly installed so to allow command line utilities to be used system wide.
On OS X and Linux, the default npm setup requires sudo access to install a module.
For example, the following will fail on a typical OS X or Linux system with the default npm setup:
npm -g install cute-stack # <-- oh oh needs sudo
This is unsuitable for several reasons. Forgetting to use sudo becomes frustrating; we're trusting npm with root access and accidentally using sudo for a local install causes permission problems (particularly with the npm local cache).
The prefix setting stores the location for globally installed modules; we can view this with the following:
npm config get prefix
Usually, the output will be /usr/local . To avoid the use of sudo, all we have to do is set ownership permissions on any subfolders in /usr/local used by npm:
sudo chown -R $(whoami) $(npm config get prefix)/{lib/node_modules,bin,share}
Now we can install global modules without root access:
npm -g install cute-stack # <-- now works without sudo
If changing ownership of system folders isn't feasible, we can use a second approach, which involves changing the prefix setting to a folder in our home path:
mkdir ~/npm-global
npm config set prefix ~/npm-global
We'll also need to set our PATH:
export PATH=$PATH:~/npm-global/bin
source ~/.profile
The source essentially refreshes the Terminal environment to reflect the changes we've made.