native `tgrep` indexer OOM-kills the host on large monorepos (no upper bound / memory cap)

### Describe the bug

The built-in grep/search tool, when the `copilot_cli_tgrep` experiment is assigned, replaces ripgrep with the native Rust `tgrep` trigram indexer. At session startup the CLI spawns a persistent daemon:

```
tgrep serve . --index-path ~/.cache/copilot/tgrep-index/<sha(cwd)> --exclude .git
```

over the **entire** working-directory tree. The startup orchestrator gates this on a **lower** file-count threshold only — there is **no upper bound and no memory cap**. The trigram build holds the whole index plus intermediate structures in RAM (the on-disk index is 1.9 GB; live anon RSS reaches ~45 GB, ≈24×). On a large monorepo this exhausts host memory and the **Linux kernel OOM-killer** reaps the process.

Observed in an internal monorepo (370,925 tracked text files) in WSL2 (46 GiB RAM):

- `tgrep` was OOM-killed **twice** at `anon-rss` ~46–47 GB (`total-vm` ~60 GB).
- A third build was caught live: its resident memory rose without interruption (0.3 GB → 4.7 GB over 20 min, every sample ≥ the previous) while it pegged ~95% of one CPU core — the same trajectory as the two kills.

Because the indexer auto-starts during session startup (it runs `tgrep count-files`, sees the repo is over threshold, and spawns `serve`), **merely opening or resuming a session in the repo root triggers it — independent of any prompt or tool call.** When it OOMs it can take swap and unrelated processes down with it, effectively wedging the WSL VM.

This is **not** the same as the existing Node.js `JavaScript heap out of memory` reports (e.g. #841, #1457, #1386, #2132). This is a **native** child process killed by the kernel OOM-killer, with no JS heap message.

Note: the feature was **not** opted into. `USE_TGREP` is unset; it is enabled via a server-side experiment assignment (`copilot_cli_tgrep`) found in `~/.cache/copilot/exp-cache.json`.

### Affected version

GitHub Copilot CLI 1.0.66-2

### Steps to reproduce the behavior

1. Be assigned the `copilot_cli_tgrep` experiment (or set `USE_TGREP=true`).
2. `cd` into a git repository with **≥ 50,000** tracked text files (≥ 10,000 on Windows). Example = **370,925** files (verified: `tgrep count-files .` → `370925 text files (12504 binary skipped, 0 errors)`).
3. Start or resume a session there: `copilot --resume --add-dir <repo-root>`.
4. The CLI logs `Starting tgrep serve (index: …, cwd: <repo-root>)` and spawns the daemon. From then on its resident memory only ever increases — every sampled RSS is ≥ the previous, with no plateau or drop — while it keeps ~one CPU core busy: an active, unbounded in-memory build.
5. On a host without ~50+ GB free, the kernel OOM-killer kills `tgrep`; the CLI keeps polling `tgrep status` (~1/sec) and can respawn into a fresh cold rebuild.

Observed daemon command line:

```
tgrep serve . --index-path /home/<user>/.cache/copilot/tgrep-index/ba63a73da095b6da --exclude .git
```

Kernel OOM-killer evidence (two separate kills):

```
Jun 29 21:40:52 kernel: Out of memory: Killed process 2817274 (tgrep) total-vm:59968024kB, anon-rss:46125720kB, file-rss:1336kB, shmem-rss:0kB, UID:1000 pgtables:113820kB
Jun 30 00:46:09 kernel: Out of memory: Killed process 3026540 (tgrep) total-vm:60820144kB, anon-rss:47212960kB, file-rss:1436kB, shmem-rss:0kB, UID:1000 pgtables:115496kB
Jun 30 00:46:09 kernel: oom-kill:constraint=CONSTRAINT_NONE,...,task=tgrep,pid=3026540,uid=1000
```

Live third occurrence (caught mid-build, idle session, no active prompt):

```
tgrep pid=3114713 cpu=94.8% rss=4.70GB elapsed=20:19   # still climbing; on-disk index already 1.9 GB
```

### Expected behavior

The indexer must never be able to exhaust host memory or OOM unrelated processes. Specifically:

- **Upper bound / ceiling.** Add a maximum file-count (and/or total-bytes) above which the tool falls back to ripgrep instead of indexing. Today the only gate is the *lower* threshold (`fileCount < lY` → skip); there is no "too large to index safely" gate, so the biggest repos — exactly where the feature is meant to help — are where it fails hardest.
- **Memory budget for the build.** Cap/stream the index build (chunked, bounded working set) so peak RSS is predictable and far below host RAM. `tgrep` currently exposes no `--max-memory` flag.
- **Graceful degradation + detection.** If the indexer is killed, detect it (cf. #277) and fall back to ripgrep rather than silently respawning into another OOM loop.
- **Safer rollout.** Auto-enabling an unbounded native indexer via experiment on large repos should not be able to take down a developer's machine.

A 50k-file repo and a 370k-file repo currently take the same unbounded code path; only the latter OOMs.

### Additional context

**Root cause (from the bundled `app.js`, pkg 1.0.66-2):**

- Feature flag: `TGREP = "copilot_cli_tgrep"` — *"When true, enables tgrep indexed search for large repositories"* (`sdk/index.d.ts:7310`). Default availability `off`; here enabled via experiment assignment.
- Threshold: `lY = process.platform === "win32" ? 1e4 : 5e4` (10,000 / 50,000 files).
- Gate: after `tgrep count-files`, `else if (u < lY) return { outcome: "skipped_below_threshold" }` — then it immediately sets the root and spawns `tgrep serve`. **No upper-bound check exists between the threshold test and the spawn.**
- Off-switch (verified): `process.env.USE_TGREP === "false"` → `"tgrep disabled via USE_TGREP=false"` → ripgrep fallback. (`USE_BUILTIN_RIPGREP=false` also forces fallback.)
- On-disk index for 1JS: **1.9 GB**; peak live anon RSS at kill: **~45 GB** (~24×).

**Workarounds:**

- `export USE_TGREP=false` (forces ripgrep; applies to new sessions).
- Launch from a subdirectory under the 50k-file threshold.
- Cap the WSL VM (`[wsl2] memory=…` in `.wslconfig`) to contain the blast radius.

**Environment:**

- OS: Linux (WSL2), kernel `6.6.87.2-microsoft-standard-WSL2`
- CPU architecture: x86_64 (32 logical CPUs)
- RAM: 46 GiB total, 12 GiB swap
- Terminal: tmux (`TERM=tmux-256color`)
- Shell: zsh (`/usr/bin/zsh`)
- Repo under test: large monorepo, 370,925 tracked text files

**Possibly related:** #277 (CLI doesn't detect that a child process it started was killed) — appears as a secondary symptom of this failure (the kernel kills `tgrep`; the CLI keeps polling `status`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

native `tgrep` indexer OOM-kills the host on large monorepos (no upper bound / memory cap) #3976

Describe the bug

Affected version

Steps to reproduce the behavior

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

native tgrep indexer OOM-kills the host on large monorepos (no upper bound / memory cap) #3976

Description

Describe the bug

Affected version

Steps to reproduce the behavior

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

native `tgrep` indexer OOM-kills the host on large monorepos (no upper bound / memory cap) #3976