Slim down to speed up
Published: 2023-12-14
tl;dr; Memory usage is up to 50% lower, runs are up to 60% faster and you can start using v4 canary today. No “unused class members” for the time being, but this feature is planned to be restored.
Introduction
Honestly, performance has always been a challenge for Knip. A longstanding bottleneck has finally been eliminated and Knip is going to be a lot faster. Skip straight to the bottom to install v4 canary and try it out! Or grab yourself a nice drink and read on if you’re interested in where we are coming from, and where we are heading.
Projects & Workspaces
From the start, Knip has relied on TypeScript for its robust parser for JavaScript and TypeScript files. And on lots of machinery important to Knip, like module resolution and accurately finding references to exported values. Parts of it can be customized, such as the (virtual) file system and the module resolver.
In TypeScript terms, a “project” is like a workspace in a monorepo. Same as each
workspace has a package.json
, each project has a tsconfig.json
. The
ts.createProgram()
method is used to create a program based on a
tsconfig.json
and the machinery starts to read and parse source code files,
resolve modules, and so on.
Up until v2, when Knip wanted to find unused things in a monorepo, all programs for all workspaces were loaded into memory. Workspaces often depend on each other, so Knip couldn’t load one project, analyze it and dispose it. This way, connections across workspaces would be lost.
Shared Workspaces
Knip v2 said goodbye to this approach and implemented its own TypeScript backend
(after using ts-morph
for this). Based on the compatibility of
compilerOptions
, workspaces were merged into shared programs whenever
possible. Having less programs in memory led to significant performance
improvements. Yet ultimately it was still a stopgap, since everything was still
kept in memory for the duration of the process.
“Why does everything need to stay in memory?”, you may wonder. The answer is
that Knip uses findReferences
at the end of the process. Knip relied on this
TypeScript Language Server method for everything that’s not easy to find. More
about that later in the story of findReferences
Serialization
Fortunately, everything that’s imported and exported from source files (including things like members of namespaces and enums) can be found relatively easily during AST traversal. This way, references to exports don’t have to be “traced back” later on.
It’s mostly class members that are harder to find due to their dynamic nature. Without these, all information can be serialized for storage and retrieval (in memory or on disk). Slimming down by taking class members out of the equation simplifies things a lot and paves the way for all sorts of improvements.
We Have To Slim Down
The relevant part in the linting process can be summarized in 5 steps:
- Collect entry files and feed them to TypeScript
- Read files, resolve modules, and create ASTs
- Traverse ASTs and collect imports & exports
- Match exports against imports to determine what’s unused
- Find references to hard-to-find exported values and members
If we would hold on to reporting unused class members, then especially steps 2 and 5 are hard to decouple. The program and the language service containing the source files used to eventually trace back references can’t really be decoupled. So class members had to go. Sometimes you have to slim down to keep moving. One step back, two steps forward.
If you rely on this feature, fear not. I plan to bring it back before the final v4, but possibly behind a flag.
What’s In Store?
So with this out of the way, everything becomes a lot clearer and we can finally really start thinking about significant memory and performance improvements. So what’s in store here? A lot!
- We no longer need to keep everything in memory, so workspaces are read and disposed in isolation, one at a time. Memory usage will be spread out more even. This does not make it faster, but reducing “out of memory” issues is definitely a Good Thing™️ in my book.
- Knip could recover from unexpected exits and continue from the last completed workspace.
- The imports and exports are in a format that can be serialized for storage and retrieval. This opens up interesting opportunities, such as local caching on disk, skipping work in subsequent runs, remote caching, and so on.
- Handling workspaces in isolation and serialization result in parallelization becoming a possibility. This becomes essential, as module resolution and AST creation and traversal are now the slowest parts of the process and are not easy to optimize significantly (unless perhaps switching to e.g Rust).
- No longer relying on
findReferences
speeds up the export/import matching part part significantly. So far I’ve seen improvements of up to 60% on total runtime, and my guess is that some larger codebases may profit even more. - The serialization format is still being explored and there is no caching yet, but having the steps more decoupled is another Good Thing™️ that future me should be happy about.
Back It Up, Please
I heard you. Here’s some example data. You can get it directly from Knip using
the --performance
flag when running it on any codebase. Below we have some
data after linting the Remix monorepo.
Knip v3
Knip v4
Takeaways
The main takeaways here:
- In v3,
findReferences
is where Knip potentially spends most of its time - In v4, total running time is down over 50%
- In v4, memory usage is down 50% (calculated using
process.memoryUsage().heapUsage
) - In v4,
getImportsAndExports
is more comprehensive to compensate for the absence offindReferences
- more on that below
Remember, unused class members are no longer reported by default in v4.
findReferences
The story of Did I mention Knip uses findReferences
…? Knip relied on it for everything
that’s not easy to find. Here’s an example of an export/import match that is
easy to find:
In v2 and v3, Knip collects many of such easy patterns. Other patterns are harder to find with static analysis. This is especially true for class members. Let’s take a look at the next example:
Without a call or new
expression to instantiate OtherName
, its method
member would not be used (since the constructor would not be executed). To
figure this out using static analysis goes a long way. Through export
declarations, import declarations, aliases, initializers, call expressions…
the list goes on and on. Yet all this magic is exactly what happens when you use
“Find all references” or “Go to definition” in VS Code.
Knip used findReferences
extensively, but it’s what makes a part of Knip
rather slow. TypeScript needs to wire things up (through
ts.createLanguageService
and program.getTypeChecker
) before it can use this,
and then it tries hard to find all references to anything you throw at it. It
does this very well, but the more class members, enum members and namespaced
imports your codebase has, the longer it inevitably takes to complete the
process.
Besides letting go of class members, a slightly more comprehensive AST traversal
is required to compensate for the absence of findReferences
(it’s the
getImportsAndExports
function in the metrics above). I’d like to give you an
idea of what “more comprehensive” means here.
In the following example, referencedExport
was stored as export from
namespace.ts
, but it was not imported directly as such:
Previously, Knip used findReferences()
to “trace back” the usage of the
exported referencedExport
.
The gist of the optimization is to pre-determine all imports and exports. During
AST traversal of index.ts
, Knip sees that referencedExport
is attached to
the imported NS
namespace, and stores that as an imported identifier of
namespace.ts
. When matching exports against imports, this lookup comes at no
extra cost. Additionally, this can be stored as strings, so it can be serialized
too. And that means it can be cached.
Knip already did this for trivial cases as shown in the first example of this article. This has now been extended to cover more patterns. This is also what needs to be tested more extensively before v4 can be released. Its own test suite and the projects in the integration tests are already covered so we’re well on our way.
For the record, findReferences
is an absolute gem of functionality provided by
TypeScript. Knip is still backed by TypeScript, and tries to speed things up by
shaking things off. In the end it’s all about trade-offs.
Let’s Go!
You can start using Knip v4 today, feel free to try it out! You might find a false positive that wasn’t there in v3, please report this.
Remember, Knip it before you ship it! Have a great day ☀️
ISC License © 2024 Lars Kappert