First off — thank you for building CodeGraph! It's a fantastic tool that makes code comprehension so much more accessible. I've been using it on a few projects and ran into a scenario where the 1 MB file size threshold felt a bit too restrictive, so I wanted to share a suggestion for a small, low-risk improvement.
The Scenario
I work with projects that include some legitimate source files between 1–5 MB. These aren't minified bundles or vendored blobs; they contain real, meaningful function definitions that would be valuable in the graph.
Currently, these files are silently skipped during indexing (size_exceeded warning, no nodes/edges), which means the function graph for the project is incomplete. I completely understand the original design rationale — the 1 MB threshold protects against WASM heap blowup from generated/minified files, and that's a wise default.
The Suggestion
What if the 1 MB default stayed exactly where it is (it's a great safe default!), but users could override it via an environment variable when they know their project has legitimate large source files?
const MAX_FILE_SIZE = (() => {
const envVal = process.env.CODEGRAPH_MAX_FILE_SIZE;
if (envVal !== undefined) {
const parsed = parseInt(envVal, 10);
if (parsed > 0 && !isNaN(parsed)) {
return parsed;
}
}
return 1024 * 1024; // 1 MB — unchanged default
})();
This keeps the existing behavior 100% intact for everyone who doesn't set the variable, while giving users with large-but-legitimate source files an escape hatch.
Complementary Measure: Memory Safety for Large Files
If a user overrides the threshold, files above 1 MB will now be parsed — but these files consume significantly more WASM linear memory (WebAssembly spec limitation: can grow but never shrink). Without additional safeguards, parsing many large files in sequence could accumulate heap memory until OOM.
A complementary change would make this safe: for files exceeding 1 MB, aggressively reclaim WASM memory after each parse.
const LARGE_FILE_THRESHOLD = 1024 * 1024; // 1 MB — same as original MAX_FILE_SIZE
Bulk indexing path (worker thread)
Recycle the worker before parsing each large file, ensuring a fresh WASM heap:
const isLargeFile = stats.size > LARGE_FILE_THRESHOLD;
if (isLargeFile) {
recycleWorker(); // Destroy worker → fresh WASM heap
}
result = await requestParse(filePath, content);
Single-file extraction path (in-process)
Reset the parser after extracting each large file:
const result = extractFromSource(relativePath, content, language, frameworkNames);
if (stats.size > LARGE_FILE_THRESHOLD) {
resetParser(language); // Delete cached parser instance → reclaim WASM heap
}
Both recycleWorker() and resetParser() already exist in the codebase — no new functions needed, just calling them at the right time.
Required import
import { ..., resetParser } from './grammars';
Memory Safety Summary
| Concern |
Mitigation |
| WASM heap growth from large files |
recycleWorker() / resetParser() after each >1MB parse |
| Cascading OOM after WASM corruption |
Worker crash → respawn (existing mechanism) |
| Parse timeout on large files |
Already scales:PARSE_TIMEOUT_MS + content.length / 100KB × 10s |
| Worker recycling cost |
Only for >1MB files; normal files unaffected |
| User overrides too aggressively |
Tunable via env var; 1 MB default remains the safe baseline |
Changes Summary
| File |
Change |
src/extraction/index.ts |
Make MAX_FILE_SIZE configurable via CODEGRAPH_MAX_FILE_SIZE env var (default unchanged: 1 MB) |
src/extraction/index.ts |
Add LARGE_FILE_THRESHOLD = 1MB for memory-reclamation gating |
src/extraction/index.ts |
Import resetParser from ./grammars |
src/extraction/index.ts (bulk path) |
Call recycleWorker() before parsing files > LARGE_FILE_THRESHOLD |
src/extraction/index.ts (single-file path) |
Call resetParser(language) after extracting files > LARGE_FILE_THRESHOLD |
First off — thank you for building CodeGraph! It's a fantastic tool that makes code comprehension so much more accessible. I've been using it on a few projects and ran into a scenario where the 1 MB file size threshold felt a bit too restrictive, so I wanted to share a suggestion for a small, low-risk improvement.
The Scenario
I work with projects that include some legitimate source files between 1–5 MB. These aren't minified bundles or vendored blobs; they contain real, meaningful function definitions that would be valuable in the graph.
Currently, these files are silently skipped during indexing (
size_exceededwarning, no nodes/edges), which means the function graph for the project is incomplete. I completely understand the original design rationale — the 1 MB threshold protects against WASM heap blowup from generated/minified files, and that's a wise default.The Suggestion
What if the 1 MB default stayed exactly where it is (it's a great safe default!), but users could override it via an environment variable when they know their project has legitimate large source files?
This keeps the existing behavior 100% intact for everyone who doesn't set the variable, while giving users with large-but-legitimate source files an escape hatch.
Complementary Measure: Memory Safety for Large Files
If a user overrides the threshold, files above 1 MB will now be parsed — but these files consume significantly more WASM linear memory (WebAssembly spec limitation: can grow but never shrink). Without additional safeguards, parsing many large files in sequence could accumulate heap memory until OOM.
A complementary change would make this safe: for files exceeding 1 MB, aggressively reclaim WASM memory after each parse.
Bulk indexing path (worker thread)
Recycle the worker before parsing each large file, ensuring a fresh WASM heap:
Single-file extraction path (in-process)
Reset the parser after extracting each large file:
Both
recycleWorker()andresetParser()already exist in the codebase — no new functions needed, just calling them at the right time.Required import
Memory Safety Summary
recycleWorker()/resetParser()after each >1MB parsePARSE_TIMEOUT_MS + content.length / 100KB × 10sChanges Summary
src/extraction/index.tsMAX_FILE_SIZEconfigurable viaCODEGRAPH_MAX_FILE_SIZEenv var (default unchanged: 1 MB)src/extraction/index.tsLARGE_FILE_THRESHOLD = 1MBfor memory-reclamation gatingsrc/extraction/index.tsresetParserfrom./grammarssrc/extraction/index.ts(bulk path)recycleWorker()before parsing files >LARGE_FILE_THRESHOLDsrc/extraction/index.ts(single-file path)resetParser(language)after extracting files >LARGE_FILE_THRESHOLD