Node.js does not need a new module system. Its existing implementation of a CommonJS module system works great. Even Facebook apparently gave up developing their internal module system, Haste. So the module system I am building is not of any production value but is just a fun weekend project.
I name this new module system node-get
because
get
is the global used to load new modules with it.
There's an executable named node-get
that can be
installed using npm -g node-get-modules
. You can run it just
like the node
executable.
$ node-get hello.js
Where hello.js
is a JavaScript file that uses
node-get
module system. Here's an example.
// hello.js
const capitalize = get('capitalize.js');
const hello = capitalize('hello world!');
console.log(hello) // Prints Hello World!
But in this post, I'll use node directly to run it because it requires no set up and works anywhere.
$ node node-get.js hello.js
I am using Node.js version 6 to build node-get
so the code
here uses ES6 syntax. Everything should work in node version 4 as well.
vm
module
First, I want to introduce you the
vm
module from
Node.js. vm
's responsibility is executing JavaScript.
Every single JavaScript file you write in your Node.js app must contact
this module at some point to have it executed.
vm
provides two methods to facilitate this.
Context in these methods refer to the global state. Both methods
execute the JavaScript code stored in string variable
someJSCode
. They differ only by the global variables they
allow someJSCode
to use.
runInNewContext
makes a brand new set of variables and
functions using entries in theNewContext
object and makes
them available to someJSCode
as globals.
runInThisContext
makes all the globals available to the
script that runs it, to be available to someJSCode
as well.
I will get to these methods the moment I start building the new module system.
It's the job a module system to read contents of JavaScript files and run their content using the vm module. It should help these files communicate by passing results of the callee to the caller.
The Node.js module system, with require
and
exports
, does just that and so will my new module system.
I'll start to code; feel free to follow along if you like.
First I'll create two files.
node-get.js
will contain the actual code of
node-get
, my new module system. hello.js
will
contain the JavaScript code that I will run using node-get. It will
demonstrate the features of node-get
I'll put some code in hello.js.
// hello.js
console.log('hello world!');
I'll start node-get.js
with following code. It's
using runInNewContext
from the vm
module.
const vm = require('vm');
const fs = require('fs');
// Read the module
const moduleJS = fs.readFileSync('./hello.js')
// Create an empty context
context = {};
// Execute JavaScript from hello.js
vm.runInNewContext(moduleJs, context)
I am extracting the content of hello.js into a variable named
moduleJS
and executing them using the introduced
vm.runInNewContext
. Since context
is just an
empty object, JavaScript in moduleJS
does not have access to
any global variables.
I'll run the program to see how it go.
$ node node-get.js hello.js
Aaaand error!
evalmachine.<anonymous>:3
console.log('hello world!');
^
ReferenceError: console is not defined
conosole
is not JavaScript
When I'm writing JavaScript, irrespective of whether it's for the
browser or Node.js, I use console.log
statements a lot. And
they work every time. So naturally I thought it will work inside
vm
. I guess subconsciously I thought that
console
is a part of JavaScript. But as it turns out it's
just a global provided by the environment.
Above I used runInNewContext
. So in this new context there is
no console
defined. One way to fix it is to add
console
to the context.
context = {console}; // Now contex has a console
vm.runInNewContext(moduleJs, context);
This does work for now, But console
is not the only global
that we may use in our modules. There is a
whole list of them
documented in Node.js documentation. process
,
Buffer
, setTimeout
, to name a few.
So if I want to pass in all the globals I'll have to do something like,
vm.runInNewContext(moduleJs, {...globals})
But remembering that I have another method from vm
at my
disposal, I will use it instead.
const vm = require('vm');
const fs = require('fs');
const moduleJs = fs.readFileSync('./hello.js');
vm.runInThisContext(moduleJs);
hello.js
now have access to any global available to
node-get.js
. It works now!
$ node node-get.js hello.js
hello world!
get
I will now add the get
global so hello.js
can
load JavaScript from other files as well.
I will define this function inside node-get.js
but I intend
to use it inside hello.js
and inside any other module that
hello.js
might load(get
).
Remember that any global available to node-get.js
is
available to JavaScript code that goes through
runInThisContext
. So we need to define get
as a
global inside node-get.js
.
global.get = filename => {
const loadedJS = fs.readFileSync(filename)
vm.runInThisContext(loadedJS);
}
With that, my node-get.js
looks like this.
// node-get.js
const vm = require('vm');
const fs = require('fs');
global.get = filename => {
const loadedJS = fs.readFileSync(filename)
vm.runInThisContext(loadedJS);
}
global.get(process.argv[2])
Note that I am using process.argv[2]
to get the entry point
to the app instead of hardcoded hello.js
.
This entry point module have access to get
as a global and
any module that's loaded using
get('any JS file')
will to. So recursively any module
in the example project can use get.
To demonstrate these capabilities of node-get
, from
hello.js
I will get
a file named
cat.js
and from within this cat.js
I will
get
another file named mouse.js
. All files
contain some dumb console.log
statement.
// hello.js
console.log('hello world!');
get('./cat.js')
// cat.js
console.log('hello, I am a cat.')
get('./mouse.js')
// mouse.js
console.log('hello, I am a mouse.')
Run node node-get.js hello.js
; Aaaand...
$ node node-get.js hello.js
hello world!
hello, I am a cat.
hello, I am a mouse.
Success!
So, for now, everything seems to work fine. Let's add more JavaScript
to our modules. I'll start with variables. I will define a variable
named name
in each of cat.js
and
mouse.js
modules.
// cat.js
const name = 'Tom'
console.log(`hello, I am a cat named ${name}`);
// mouse.js
const name = 'Jerry'
console.log(`hello, I am a mouse named ${name}`);
This time, I will get
both modules in hello.js
// hello.js
console.log('hello world!');
get('./cat.js')
get('./mouse.js')
Aaaand run it.
$ node node-get.js hello.js
hello world!
hello, I am a cat named Tom
evalmachine.<anonymous>:1
const name = 'Jerry'
^
TypeError: Identifier 'name' has already been declared.
Variables defined in a Node.js(CommonJS) module are local to that module.
Unless we export them using exports
we can't access them
outside the module. But code in cat.js
and
mouse.js
apparently run in the same scope.
Just because they live in two separate files does not make them run in two
separate scopes. This problem can be traced to this line from
node-get.js
.
vm.runInThisContext(loadedJS);
Every single module that will be loaded using our module system will go
through this line. So every single module will be run in the context of
node-get.js
; And in the scope of the
get
function.
The problem of scopes in JavaScript is well discussed over many years.
Before ES6 functions are the only constructs in JavaScript that had a
scope of their own. (ES6 introduced classes, let
and
const
) So to give these modules their own scope I'll have
to stick them inside one.
I'll write a function called wrap which returns the code of a JavaScript function containing JavaScript code from the module.
const wrap = moduleJS => (
`(() => {${moduleJS}})()` // wrapping moduleJS in a self calling arrow function
)
global.get = filename => {
const loadedJS = fs.readFileSync(filename);
const wrappedJS = wrap(loadedJS)
vm.runInThisContext(wrappedJS);
}
Now contents of the loaded module are put inside a function. This function calls itself.
This fixes node-get
's scope problem so I get the desired
output.
$ node node-get.js hello.js
hello world!
hello, I am a cat named Tom
hello, I am a mouse named Jerry
get
relative paths
So now my module system is working pretty well. Currently, my
Hello-Tom&Jerry project's and node-get
s files are all
in the same directory. I'll tidy up things a bit by moving the example
project's files into a directory aptly named 'example'.
├── example
│ ├── cat.js
│ ├── mouse.js
│ └── hello.js
└── node-get.js
I shouldn't need to change anything inside hello.js
since
I used relative paths to get
other modules in it. And
relative paths would be the same in this directory structure as well.
Let's see how that works out.
$ node node-get.js example/hello.js
hello world!
fs.js:640
return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
^
Error: ENOENT: no such file or directory, open './cat.js'
I was used to require
files with paths relative to the module
I am calling require
in, I thought get
will work
the same way. But turns out I need to do a bit of work to get it to work
that way.
Let's first understand why it did not work this way.
The fs
module is what we use to read the contents in loaded
modules from node-get.js
fs module actually resolves relative paths relative to the current working
directory of the process. Say I run node-get
from
cwd/node-get
. So when get('./cat.js')
is
called inside hello.js
(or anywhere else) where it looks for
it is cwd/cat.js
. It's not going to find a cat.js there
because I just moved it into a directory named example
so
it's in cwd/example/cat.js
I'd like get
to resolve relative paths the same way
require
does. So I want the get
global method in
each of my modules to resolve relative modules relative to its own self.
So get
in each module should work in a way that is specific
to that module. The best way I could achieve this is providing each module
with its own specific instance of get
.
So first I'll change wrappedFunction
to take a
get
parameter.
const wrap = moduleJS => (
`(get => {${moduleJS}})`
)
Note that the wrappedFunction
is not self-calling anymore. (I
have taken out the ()
at the end.) Instead, it's returned
to the place where runInThisContext
is called so it can be
called from there.
Now I'll change the get
function.
I have already decided that I need a specific get
function
for each new module. So instead of one single global
get
function, I will create a get
factory
function named createGet
so I can create any number of
get
s from it. Each created get
is different from
another because each get
function has a
caller
specific to that particular get
.
Here is the createGet
function with each line following a
comment describing it.
const createGet = caller => {
return filename => {
// Get the directory the caller is in
const callersDirectory = path.dirname(caller);
// resolve relative path relative to the caller's directory
const filepath = path.resolve(callersDirectory, filename);
// Read the content in loaded file
const loadedJS = fs.readFileSync(filepath);
// wrap it inside the wrapper function. It's not immediately called now
const wrappedJS = wrap(loadedJS)
// Run the content through vm. This returns the wrapped function so we can call it later
const newModule = vm.runInThisContext(wrappedJS);
// Create a new get to be used in this new module, using createGet itself. Bit of a recursion :)
const newGet = createGet(filename);
// Call the newModule (wrappedFunction) with the created `get`
newModule(newGet);
}
}
When a get
is passed a relative file path, it is resolved
relative to the get
function's caller
s
location.
Here's the latest node-get.js
.
// node-get.js
cconst vm = require('vm');
const fs = require('fs');
const path = require('path');
const wrap = moduleJS => (
`(get => {${moduleJS}})`
)
const createGet = caller => {
return filename => {
const callersDirectory = path.dirname(caller);
const filepath = path.resolve(callersDirectory, filename); // Paths resolved relative to caller's directory
const loadedJS = fs.readFileSync(filepath);
const wrappedJS = wrap(loadedJS)
const newModule = vm.runInThisContext(wrappedJS);
const newGet = createGet(filename);
newModule(newGet);
}
}
// The entry point to the app does not have a caller. So we create an artificial one.
const rootCaller = path.join(process.cwd(), '__main__');
const rootGet = createGet(rootCaller);
rootGet(process.argv[2])
Now relative paths work the way we are familiar with and I get the expected output.
$ node node-get.js example/hello.js
hello world!
hello, I am a cat named Tom
hello, I am a mouse named Jerry
give
Currently when get
is used to load another JavaScript file
the contents of that file is executed. But with Node.js modules we can
return the results of this execution to the caller to be used later.
(Using exports
)
const fs = require('fs');
fs.readFileSync('somefile') // Like this.
Now I'll implement the same functionality in node-get
.
I'll provide each module with a give
function to
complement the get
it already has. give
can be
used in the following way.
// capitalize.js
const capitalize = () => { /*function logic*/ }
give(capitalize)
First, I'll change the wrapperFunction to accept another parameter,
give
.
const wrap = moduleJS => (
`((get, give) => {${moduleJS}})`
)
I'll implement give
in the
createGet
function itself.
const createGet = caller => {
return filename => {
const callersDirectory = path.dirname(caller);
const filepath = path.resolve(callersDirectory, filename); // Paths resolved relative to caller's directory
const loadedJS = fs.readFileSync(filepath);
const wrappedJS = wrap(loadedJS)
const newModule = vm.runInThisContext(wrappedJS);
const newGet = createGet(filepath);
let givenValue;
const newGive = value => { givenValue = value }
newModule(newGet, newGive); // Pass new give along side new get.
return givenValue;
}
}
It's very simple to implement give. It takes the
value
passed to it and assigns it to
givenValue
which is returned from the outer
get
function. This would mean that only the last
give
call from a module will take effect.
This completes my new module system and I feel quite clever!
Here are the files from my example project updated to demonstrate the
latest features of node-get
.
// utils/capitalize.js
// Lifted from stackoverflow: http://stackoverflow.com/a/7592235/1150725
const capitalize = text => {
return text.replace(/(?:^|\s)\S/g, function(a) { return a.toUpperCase(); });
}
give(capitalize)
// cat.js
const capitalize = get('./utils/capitalize.js')
const name = 'Tom'
give(capitalize(`hello, I am a cat named ${name}`));
// mouse.js
const capitalize = get('./utils/capitalize.js')
const name = 'Jerry'
give(capitalize(`hello, I am a mouse named ${name}`));
// hello.js
console.log('hello world!');
const catText = get('./cat.js');
console.log(catText);
const mouseText = get('./mouse.js');
console.log(mouseText);
Here is the completed node-get.js
.
// node-get.js
const vm = require('vm');
const fs = require('fs');
const path = require('path');
const wrap = moduleJS => (
`((get, give) => {${moduleJS}})`
)
const createGet = parent => {
return filename => {
const parentsDirectory = path.dirname(parent);
const filepath = path.resolve(parentsDirectory, filename); // Paths resolved relative to parent's directory
const loadedJS = fs.readFileSync(filepath);
const wrappedJS = wrap(loadedJS)
const newModule = vm.runInThisContext(wrappedJS);
const newGet = createGet(filepath);
let givenValue;
const newGive = value => { givenValue = value }
newModule(newGet, newGive);
return givenValue;
}
}
// The entry point to the app does not have a parent. So we create an artificial one.
const rootParent = path.join(process.cwd(), '__main__');
const rootGet = createGet(rootParent);
rootGet(process.argv[2])
I'll run node-get
one last time.
$ node node-get.js example/hello.js
hello world!
Hello, I Am A Cat Named Tom
Hello, I Am A Mouse Named Jerry
Node.js module system works very similar to the module system I just built.
Node.js module system reads a new module using
fs.readFileSync
and executes its JavaScript using
vm.runInThisContext
just the way
node-get
does.
It also wraps JavaScript files inside a
wrapperFunction
to give them a local scope. In fact, this wrapper can be looked at
using the module
module. Let me show.
$ node
> const m = require('module')
> m.wrap("somejs")
'(function (exports, require, module, __filename, __dirname) { somejs\n});'
See that its signature is quite similar to node-get
's
wrapper function's.
require
function specific to each module and
uses this fact to resolve relative paths relative to the module's
location
These similarities are there because node-get
is built using
the understanding I got of Node.js module system by going through its
source.
And of course, Node.js module system has many additional features as well.
require
d, it is cached. So later
require
s to the same module will be faster. This also means
that they act as singletons. (A module is executed only once)
node_modules
. When require
is called
with an absolute path it looks in several locations including a
node_modules
directory in the root of the project.
require
JSON files with it.
These features are not that complex. I bet you could think of ways to
implement them into node-get
if needed.
This excercise helped me to get some subtle understanding of Node.js. I hope you enjoyed reading about it.
Despite what is commonly said, I really think that JavaScript is alright. I love Node.js for allowing me to do a great many things with it.
I plan to hack deeper into Node.js, and write about my experiments with it. Stay tuned!