Skip to content

fix(resolution): Avoid callback synthesis scans on minified files#1011

Open
caleb-kaiser wants to merge 1 commit into
colbymchenry:mainfrom
caleb-kaiser:issue-999-clean-fix
Open

fix(resolution): Avoid callback synthesis scans on minified files#1011
caleb-kaiser wants to merge 1 commit into
colbymchenry:mainfrom
caleb-kaiser:issue-999-clean-fix

Conversation

@caleb-kaiser

@caleb-kaiser caleb-kaiser commented Jun 27, 2026

Copy link
Copy Markdown

Addresses #999

Summary

This change fixes a CPU spike in reference resolution caused by the closure-collection synthesizer doing expensive whole-file scans on files that are effectively minified or otherwise packed onto very few lines. To reproduce the conditions from issue 999, I tested against an actual Metronic demo app while developing. In the Metronic app, indexing consistently spent disproportionate time in callback synthesis even though the relevant files were not useful inputs for this heuristic. The fix keeps the synthesizer active for normal source files, but avoids running it on generated files and on files whose line density makes the scan both expensive and low value.

The fix is intentionally narrow and does not address all secondary issues listed on isssue 999, in the spirit of keeping the PR focused and avoiding too much blast radius. It is a narrowly scoped resolver-performance fix, not a broad core-behavior change. The only meaningful functional risk area is closure-collection edge synthesis on pathological packed files, and that tradeoff is intentional.

The bug

The issue was that reference resolution spent too much CPU time inside closureCollectionEdges, which is one of the callback synthesis passes in src/resolution/callback-synthesizer.ts. That pass looks for a specific dynamic-dispatch shape: one method appends a closure into a collection, another method iterates the same collection and invokes each element. When that pattern exists in ordinary source code, the synthesis is useful because it closes a real graph gap.

The problem is that the implementation previously walked method and function nodes one by one and repeatedly read and sliced their source from the backing file. On large packed files, especially minified bundles or generated artifacts with extremely high characters-per-line, this becomes a poor trade. The scan cost stays high, but the chance of producing legitimate closure-collection edges is low. In the Metronic reproduction, that mismatch showed up as a reference-resolution CPU spike.

Root cause

I found two concrete problems.

First, the pass operated at the node level rather than the file level. It iterated every candidate method or function independently, called ctx.readFile for that node's file, sliced the relevant lines, and then ran the closure-collection regexes. That meant the same file could be re-read and re-processed many times during a single pass.

Second, the pass had no guardrail for pathological file shapes. We already treat generated content carefully in other resolution paths because heuristic synthesis on generated or packed files tends to be expensive and noisy. closureCollectionEdges did not apply that same discipline, so it would still spend time scanning files that were obviously bad inputs for this kind of source-pattern analysis.

The fix

The implementation now groups candidate nodes by filePath before scanning. Each file is read once for the pass, then all method and function nodes in that file are checked against the in-memory content. This removes the repeated read-and-slice behavior that amplified the cost on dense files.

The pass now also skips files that isGeneratedFile(filePath) identifies as generated, which aligns this synthesizer with the rest of the resolver's existing generated-file handling.

Finally, the pass adds a density check before doing any closure-collection work for a file. It computes an approximate characters-per-line value and skips files above the threshold:

const newlineCount = (content.match(/\n/g)?.length ?? 0) + 1;
if (content.length / newlineCount > 200) continue;

This is intentionally simple. The problem here is not semantic correctness on packed artifacts; it is avoiding expensive heuristic work on files that are structurally bad candidates. A dense file threshold is a practical way to reject minified-style inputs without affecting normal source files.

Note on scpoe

This does not change the closure-collection synthesis model itself. The same dispatcher and registrar matching logic still runs on real source files, and the existing precision gate remains intact: the dispatcher side must actually invoke each iterated element, so plain collection iteration still does not synthesize edges.

The goal here is to avoid modifying CodeGraph's underlying graph notions and to reduce obviously wasted work while preserving the useful synthesis behavior on ordinary code.

Test coverage

The clean branch includes a focused regression test in __tests__/closure-collection-synthesizer.test.ts.

The existing end-to-end closure-collection test remains in place and still verifies the core behavior: dispatcher-to-registrar synthesis across files and classes, support for both write { $0.append(...) } and direct append(...), and the precision gate that avoids synthesizing from non-invoking forEach loops.

The new test adds a minified-style fixture file containing valid method definitions on a single dense line. It verifies two things: those methods are still extracted and indexed as normal nodes, and the closure-collection synthesizer does not produce heuristic edges from that dense file. That is the behavior we want because the fix should only suppress the expensive synthesis pass for pathological inputs, not break indexing more broadly.

Validation

All existing and new tests passed. My Metronic reproduction (not committed here, obviously) indexed successfully in about 31.5 seconds with 773 files, 13,075 nodes, and 96,457 edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant