The Resolving Algorithm
Learn about the resolving algorithm, how it prevents dependency hell, and circular dependencies.
We'll cover the following
The term dependency hell describes a situation whereby two or more dependencies of a program in turn depend on a shared dependency, but require different incompatible versions. Node.js solves this problem elegantly by loading a different version of a module depending on where the module is loaded from. All the merits of this feature go to the way Node.js package managers (such as npm or yarn) organize the dependencies of the application, and also to the resolving algorithm used in the require()
function.
Let’s now get a quick overview of this algorithm. As we saw, the resolve()
function takes a module name (which we will call moduleName
) as input and it returns the full path of the module. This path is then used to load its code and also to identify the module uniquely. The resolving algorithm can be divided into the following three major branches:
File modules: If
moduleName
starts with/
, it’s already considered an absolute path to the module, and it’s returned as it is. If it starts with./
, thenmoduleName
is considered a relative path, which is calculated starting from the directory of the requiring module.Core modules: If
moduleName
is not prefixed with/
or./
, the algorithm first tries to search within the core Node.js modules.Package modules: If no core module is found matching
moduleName
, then the search continues by looking for a matching module in the firstnode_ modules
directory that is found by navigating up in the directory structure starting from the requiring module. The algorithm continues to search for a match by looking into the nextnode_modules
directory up in the directory tree, until it reaches the root of the filesystem.
For file and package modules, both files and directories can match moduleName
. In particular, the algorithm tries to match the following:
<moduleName>.js
<moduleName>/index.js
The directory/file specified in the
main
property of<moduleName>/package.json
The node_modules
directory is actually where the package managers install the dependencies of each package. This means that, based on the algorithm we just described, each package can have its own private dependencies. For example, consider the following directory structure:
In the previous example, myApp
, depB
, and depC
all depend on depA
. However, they all have their own private version of the dependency! Following the rules of the resolving algorithm, using require('depA')
loads a different file depending on the module that requires it, for example:
• Calling require('depA')
from /myApp/foo.js
will load /myApp/node_modules/depA/index.js
• Calling require('depA')
from /myApp/node_modules/depB/bar.js
will load /myApp/node_modules/depB/node_modules/depA/index.js
• Calling require('depA')
from /myApp/node_modules/depC/foobar.js
will load /myApp/node_modules/depC/node_modules/depA/index.js
The resolving algorithm is the core part behind the robustness of the Node.js dependency management, and it makes it possible to have hundreds or even thousands of packages in an application without having collisions or problems of version compatibility.
Note: The resolving algorithm is applied transparently for us when we invoke the
require()
function. However, if needed, it can still be used directly by any module by simply invokingrequire.resolve()
.
The module cache
Each module is only loaded and evaluated the first time it is required because any subsequent call to require()
will simply return the cached version. This should be clear by looking at the code of our custom require()
function. Caching is crucial for performance, but it also has some important functional implications:
• It makes it possible to have cycles within module dependencies.
• It guarantees, to some extent, that the same instance is always returned when requiring the same module from within a given package.
The module cache is exposed via the require.cache
variable, so it’s possible to directly access it if needed. A common use case is to invalidate any cached module by deleting the relative key in the require.cache
variable, a practice that can be useful during testing but very dangerous if applied in normal circumstances.
Circular dependencies
Many consider circular dependencies an intrinsic design issue, but it’s something that might actually happen in a real project, so it’s useful for us to know at least how this works with CommonJS. If we look again at our custom require()
function, we immediately get a glimpse of how this might work and what its caveats are.
But let’s walk together through an example to see how CommonJS behaves when dealing with circular dependencies. Let’s suppose we have the scenario represented in the illustration below:
A module called main.js
requires a.js
and b.js
. In turn, a.js
requires b.js
, but b.js
relies on a.js
as well! It’s obvious that we have a circular dependency here as the a.js
module requires the b.js
module, and the b.js
module requires the a.js
module. Let’s have a look at the code of these two modules:
The
a.js
module:
exports.loaded = false;const b = require('./b')module.exports = {b,loaded: true // overrides the previous export}
The
b.js
Module:
exports.loaded = false;const a = require('./a');module.exports = {a,loaded: true};
Now, let’s see how these modules are required by the main.js
module:
const a = require('./a')const b = require('./b')console.log('a ->', JSON.stringify(a, null, 2))console.log('b ->', JSON.stringify(b, null, 2))
The result reveals the caveats of circular dependencies with CommonJS; different parts of our application will have a different view of what is exported by the a.js
module and the b.js
module, depending on the order in which those dependencies are loaded. While both the modules are completely initialized as soon as they’re required from the main.js
module, the a.js
module will be incomplete when it is loaded from the b.js
module. In particular, its state will be the one that it reached the moment b.js
was required.
In order to understand what happens behind the scenes better, let’s analyze step by step how the different modules are interpreted and how their local scope changes along the way:
The steps are as follows:
The processing starts in the
main.js
module, which immediately requires thea.js
module.The first thing that the
a.js
module does is set an exported value calledloaded
tofalse
.At this point, the
a.js
module requires theb.js
module.Like the
a.js
module, the first thing that theb.js
module does is set an exported value calledloaded
tofalse
.Now, the
b.js
module requiresa.js
(cycle).Since
a.js
has already been traversed, its currently exported value is immediately copied into the scope of theb.js
module.The
b.js
module finally changes theloaded
value totrue
.Now that the
b.js
module has been fully executed, the control returns to thea.js
module, which now holds a copy of the current state of theb.js
module in its own scope.The last step of the
a.js
module is to set itsloaded
value totrue
.The a
.js
module is now completely executed, and the control returns to themain.js
module, which now has a copy of the current state of thea.js
module in its internal scope.The
main.js
module requires theb.js
module, which is immediately loaded from cache.The current state of the
b.js
module is copied into the scope of themain.js
module where we can finally see the complete picture of what the state of every module is.
As described earlier, the issue here is that the b.js
module has a partial view of the a.js
module, and this partial view gets propagated over when the b.js
module is required in the main.js
module. This behavior should spark an intuition that can be confirmed if we swap the order in which the two modules are required in the main.js
module. If we actually try this, we’ll see that this time it’ll be the a.js
module that will receive an incomplete version of the b.js
module.
We understand that this can become quite a fuzzy business if we lose control of which module is loaded first, which can happen quite easily if the project is big enough, but don’t worry; we’ll go through everything in detail.