fix(resolution): Avoid callback synthesis scans on minified files#1011
Open
caleb-kaiser wants to merge 1 commit into
Open
fix(resolution): Avoid callback synthesis scans on minified files#1011caleb-kaiser wants to merge 1 commit into
caleb-kaiser wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #999
Summary
This change fixes a CPU spike in reference resolution caused by the closure-collection synthesizer doing expensive whole-file scans on files that are effectively minified or otherwise packed onto very few lines. To reproduce the conditions from issue 999, I tested against an actual Metronic demo app while developing. In the Metronic app, indexing consistently spent disproportionate time in callback synthesis even though the relevant files were not useful inputs for this heuristic. The fix keeps the synthesizer active for normal source files, but avoids running it on generated files and on files whose line density makes the scan both expensive and low value.
The fix is intentionally narrow and does not address all secondary issues listed on isssue 999, in the spirit of keeping the PR focused and avoiding too much blast radius. It is a narrowly scoped resolver-performance fix, not a broad core-behavior change. The only meaningful functional risk area is closure-collection edge synthesis on pathological packed files, and that tradeoff is intentional.
The bug
The issue was that reference resolution spent too much CPU time inside
closureCollectionEdges, which is one of the callback synthesis passes insrc/resolution/callback-synthesizer.ts. That pass looks for a specific dynamic-dispatch shape: one method appends a closure into a collection, another method iterates the same collection and invokes each element. When that pattern exists in ordinary source code, the synthesis is useful because it closes a real graph gap.The problem is that the implementation previously walked method and function nodes one by one and repeatedly read and sliced their source from the backing file. On large packed files, especially minified bundles or generated artifacts with extremely high characters-per-line, this becomes a poor trade. The scan cost stays high, but the chance of producing legitimate closure-collection edges is low. In the Metronic reproduction, that mismatch showed up as a reference-resolution CPU spike.
Root cause
I found two concrete problems.
First, the pass operated at the node level rather than the file level. It iterated every candidate method or function independently, called
ctx.readFilefor that node's file, sliced the relevant lines, and then ran the closure-collection regexes. That meant the same file could be re-read and re-processed many times during a single pass.Second, the pass had no guardrail for pathological file shapes. We already treat generated content carefully in other resolution paths because heuristic synthesis on generated or packed files tends to be expensive and noisy.
closureCollectionEdgesdid not apply that same discipline, so it would still spend time scanning files that were obviously bad inputs for this kind of source-pattern analysis.The fix
The implementation now groups candidate nodes by
filePathbefore scanning. Each file is read once for the pass, then all method and function nodes in that file are checked against the in-memory content. This removes the repeated read-and-slice behavior that amplified the cost on dense files.The pass now also skips files that
isGeneratedFile(filePath)identifies as generated, which aligns this synthesizer with the rest of the resolver's existing generated-file handling.Finally, the pass adds a density check before doing any closure-collection work for a file. It computes an approximate characters-per-line value and skips files above the threshold:
This is intentionally simple. The problem here is not semantic correctness on packed artifacts; it is avoiding expensive heuristic work on files that are structurally bad candidates. A dense file threshold is a practical way to reject minified-style inputs without affecting normal source files.
Note on scpoe
This does not change the closure-collection synthesis model itself. The same dispatcher and registrar matching logic still runs on real source files, and the existing precision gate remains intact: the dispatcher side must actually invoke each iterated element, so plain collection iteration still does not synthesize edges.
The goal here is to avoid modifying CodeGraph's underlying graph notions and to reduce obviously wasted work while preserving the useful synthesis behavior on ordinary code.
Test coverage
The clean branch includes a focused regression test in
__tests__/closure-collection-synthesizer.test.ts.The existing end-to-end closure-collection test remains in place and still verifies the core behavior: dispatcher-to-registrar synthesis across files and classes, support for both
write { $0.append(...) }and directappend(...), and the precision gate that avoids synthesizing from non-invokingforEachloops.The new test adds a minified-style fixture file containing valid method definitions on a single dense line. It verifies two things: those methods are still extracted and indexed as normal nodes, and the closure-collection synthesizer does not produce heuristic edges from that dense file. That is the behavior we want because the fix should only suppress the expensive synthesis pass for pathological inputs, not break indexing more broadly.
Validation
All existing and new tests passed. My Metronic reproduction (not committed here, obviously) indexed successfully in about 31.5 seconds with 773 files, 13,075 nodes, and 96,457 edges.