* [Architecture](#architecture) * [Design principles](#design-principles) * [Overview](#overview) * [Scan phase](#scan-phase) * [Compile phase](#compile-phase) * [Notes about parsing](#notes-about-parsing) * [Symbols and scopes](#symbols-and-scopes) * [Constant folding](#constant-folding) * [TypeScript parsing](#typescript-parsing) * [Notes about linking](#notes-about-linking) * [CommonJS linking](#commonjs-linking) * [ES6 linking](#es6-linking) * [Hybrid CommonJS and ES6 modules](#hybrid-commonjs-and-es6-modules) * [Scope hoisting](#scope-hoisting) * [Converting ES6 imports to CommonJS imports](#converting-es6-imports-to-commonjs-imports) * [The runtime library](#the-runtime-library) * [Tree shaking](#tree-shaking) * [Code splitting](#code-splitting) * [Notes about printing](#notes-about-printing) * [Symbol minification](#symbol-minification) # Architecture Documentation This document covers how esbuild's bundler works. It's intended to aid in understanding the code, in understanding what tricks esbuild uses to improve performance, and hopefully to enable people to modify the code. Note that there are some design decisions that have been made differently than other bundlers for performance reasons. These decisions may make the code harder to work with. Keep in mind that this project is an experiment in progress, and is not the result of a comprehensive survey of implementation techniques. The way things work now is not necessarily the best way of doing things. ### Design principles * **Maximize parallelism** Most of the time should be spent doing fully parallelizable work. This can be observed by taking a CPU trace using the `--trace=[file]` flag and viewing it using `go tool trace [file]`. * **Avoid doing unnecessary work** For example, many bundlers have intermediate stages where they write out JavaScript code and read it back in using another tool. This work is unnecessary because if the tools used the same data structures, no conversion would be needed. * **Transparently support both ES6 and CommonJS module syntax** The parser in esbuild processes a superset of both ES6 and CommonJS modules. It doesn't distinguish between ES6 modules and other modules so you can use both ES6 and CommonJS syntax in the same file if you'd like. * **Try to do as few full-AST passes as possible for better cache locality** Compilers usually have many more passes because separate passes makes code easier to understand and maintain. There are currently only three full-AST passes in esbuild because individual passes have been merged together as much as possible: 1. Lexing + parsing + scope setup + symbol declaration 2. Symbol binding + constant folding + syntax lowering + syntax mangling 3. Printing + source map generation * **Structure things to permit a "watch mode" where compilation can happen incrementally** Incremental builds mean only rebuilding changed files to the greatest extent possible. This means not re-running any of the full-AST passes on unchanged files. Data structures that live across builds must be immutable to allow sharing. Unfortunately the Go type system can't enforce this, so care must be taken to uphold this as the code evolves. ## Overview
The build pipeline has two main phases: scan and compile. These both reside in [bundler.go](../internal/bundler/bundler.go). ### Scan phase This phase starts with a set of entry points and traverses the dependency graph to find all modules that need to be in the bundle. This is implemented in `bundler.ScanBundle()` as a parallel worklist algorithm. The worklist starts off being the list of entry points. Each file in the list is parsed into an AST on a separate goroutine and may add more files to the worklist if it has any dependencies (either ES6 `import` statements, ES6 `import()` expressions, or CommonJS `require()` expressions). Scanning continues until the worklist is empty. ### Compile phase This phase creates a bundle for each entry point, which involves first "linking" imports with exports, then converting the parsed ASTs back into JavaScript, then concatenating them together to form the final bundled file. This happens in `(*Bundle).Compile()`. ## Notes about parsing The parser is separate from the lexer. The lexer is called on the fly as the file is parsed instead of lexing the entire input ahead of time. This is necessary due to certain syntactical features such as regular expressions vs. the division operator and JSX elements vs. the less-than operator, where which token is parsed depends on the semantic context. Lexer lookahead has been kept to one token in almost all cases with the notable exception of TypeScript, which requires arbitrary lookahead to parse correctly. All such cases are in methods called `trySkipTypeScript*WithBacktracking()` in the parser. The parser includes a lot of transformations, all of which have been condensed into just two passes for performance: 1. The first pass does lexing and parsing, sets up the scope tree, and declares all symbols in their respective scopes. 2. The second pass binds all identifiers to their respective symbols using the scope tree, substitutes compile-time definitions for their values, performs constant folding, does lowering of syntax if we're targeting an older version of JavaScript, and performs syntax mangling/compression if we're doing a production build. Note that, from experience, the overhead of syscalls in import path resolution is appears to be very high. Caching syscall results in the resolver and the file system implementation is a very sizable speedup. ### Symbols and scopes A symbol is a way to refer to an identifier in a precise way. Symbols are referenced using a 64-bit identifier instead of using the name, which makes them easy to refer to without worrying about scope. For example, the parser can generate new symbols without worrying about name collisions. All identifiers reference a symbol, even "unbound" ones that don't have a matching declaration. Symbols have to be declared in a separate pass from the pass that binds identifiers to symbols because JavaScript has "variable hoisting" where a child scope can declare a hoisted symbol that can become bound to identifiers in parent and sibling scopes. Symbols for the whole file are stored in a flat top-level array. That way you can easily traverse over all symbols in the file without traversing the AST. That also lets us easily create a modified AST where the symbols have been changed without affecting the original immutable AST. Because symbols are identified by their index into the top-level symbol array, we can just clone the array to clone the symbols and we don't need to worry about rewiring all of the symbol references. The scope tree is not attached to the AST because it's really only needed to pass information from the first pass to the second pass. The scope tree is instead temporarily mapped onto the AST within the parser. This is done by having the first and second passes both call `pushScope*()` and `popScope()` the same number of times in the same order. Specifically the first pass calls `pushScopeForParsePass()` which appends the pushed scope to `scopesInOrder`, and the second pass calls `pushScopeForVisitPass()` which reads off the scope to push from `scopesInOrder`. This is mostly pretty straightforward except for a few places where the parser has pushed a scope and is in the middle of parsing a declaration only to discover that it's not a declaration after all. This happens in TypeScript when a function is forward-declared without a body, and in JavaScript when it's ambiguous whether a parenthesized expression is an arrow function or not until we reach the `=>` token afterwards. This would be solved by doing three passes instead of two so we finish parsing before starting to set up scopes and declare symbols, but we're trying to do this in just two passes. So instead we call `popAndDiscardScope()` or `popAndFlattenScope()` instead of `popScope()` to modify the scope tree later if our assumptions turn out to be incorrect. ### Constant folding The constant folding and compile-time definition substitution is pretty minimal but is enough to handle libraries such as React which contain code like this: ```js if (process.env.NODE_ENV === 'production') { module.exports = require('./cjs/react.production.min.js'); } else { module.exports = require('./cjs/react.development.js'); } ``` Using `--define:process.env.NODE_ENV="production"` on the command line will cause `process.env.NODE_ENV === 'production'` to become `"production" === 'production'` which will then become `true`. The parser then treats the `else` branch as dead code, which means it ignores calls to `require()` and `import()` inside that branch. The `react.development.js` module is never included in the dependency graph. ### TypeScript parsing TypeScript parsing has been implemented by augmenting the existing JavaScript parser. Most of it just involves skipping over type declarations as if they are whitespace. Enums, namespaces, and TypeScript-only class features such as parameter properties must all be converted to JavaScript syntax, which happens in the second parser pass. I've attempted to match what the TypeScript compiler does as close as is reasonably possible. One TypeScript subtlety is that unused imports in TypeScript code must be removed, since they may be type-only imports. And if all imports in an import statement are removed, the whole import statement itself must also be removed. This has semantic consequences because the import may have side effects. However, it's important for correctness because this is how the TypeScript compiler itself works. The imported package itself may not actually even exist on disk since it may only come from a `declare` statement. Tracking used imports is handled by the `tsUseCounts` field in the parser. ## Notes about linking The main goal of linking is to merge multiple modules into a single file so that imports from one module can reference exports from another module. This is accomplished in several different ways depending on the import and export features used. Linking performs an optimization called "tree shaking". This is also known as "dead code elimination" and removes unreferenced code from the bundle to reduce bundle size. Tree shaking is always active and cannot be disabled. Finally, linking may also involve dividing the input code among multiple chunks. This is known as "code splitting" and both allows lazy loading of code and sharing code between multiple entry points. It's disabled by default in esbuild but can be enabled with the `--splitting` flag. This will all be described in more detail below. ### CommonJS linking If a module uses any CommonJS features (e.g. references `exports`, references `module`, or uses a top-level `return` statement) then it's considered a CommonJS module. This means it's represented as a separate closure within the bundle. This is similar to how Webpack normally works. Here's a simplified example to explain what this looks like:foo.js | bar.js | bundle.js |
---|---|---|
```js exports.fn = () => 123 ``` | ```js const foo = require('./foo') console.log(foo.fn()) ``` | ```js let __commonJS = (callback, module) => () => { if (!module) { module = {exports: {}}; callback(module.exports, module); } return module.exports; }; // foo.js var require_foo = __commonJS((exports) => { exports.fn = () => 123; }); // bar.js const foo = require_foo(); console.log(foo.fn()); ``` |
foo.js | bar.js | bundle.js |
---|---|---|
```js export const fn = () => 123 ``` | ```js import {fn} from './foo' console.log(fn()) ``` | ```js // foo.js const fn = () => 123; // bar.js console.log(fn()); ``` |
Chunk for index.js | Chunk for settings.js | Chunk for shared code |
---|---|---|
```js import { api, session } from "./chunk.js"; // net.js function get(url) { return fetch(url).then((r) => r.text()); } // config.js function load() { return get(api + session); } // index.js let el = document.getElementById("el"); load().then((x) => el.textContent = x); ``` | ```js import { api, session } from "./chunk.js"; // net.js function put(url, body) { fetch(url, {method: "PUT", body}); } // config.js function save(value) { return put(api + session, value); } // settings.js let it = document.getElementById("it"); it.oninput = () => save(it.value); ``` | ```js // config.js let session = Math.random(); let api = "/api?session="; export { api, session }; ``` |
entry1.js | entry2.js | data.js |
---|---|---|
```js import {data} from './data' console.log(data) ``` | ```js import {setData} from './data' setData(123) ``` | ```js export let data export function setData(value) { data = value } ``` |
Chunk for entry1.js | Chunk for entry2.js | Chunk for shared code |
---|---|---|
```js import { data } from "./chunk.js"; // entry1.js console.log(data); ``` | ```js import { data } from "./chunk.js"; // data.js function setData(value) { data = value; } // entry2.js setData(123); ``` | ```js // data.js let data; export { data }; ``` |
Chunk for entry1.js | Chunk for entry2.js | Chunk for shared code |
---|---|---|
```js import { data } from "./chunk.js"; // entry1.js console.log(data); ``` | ```js import { setData } from "./chunk.js"; // entry2.js setData(123); ``` | ```js // data.js let data; function setData(value) { data = value; } export { data, setData }; ``` |
Original code | Code with symbol minification |
---|---|
```js function useReducer(reducer, initialState) { let [state, setState] = useState(initialState); function dispatch(action) { let nextState = reducer(state, action); setState(nextState); } return [state, dispatch]; } ``` | ```js function useReducer(b, c) { let [a, d] = useState(c); function e(f) { let g = b(a, f); d(g); } return [a, e]; } ``` |