Flaky test: TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning — 60s CompactionRunsCompleted poll times out

**AI Tool Usage Notice**
If you used an AI tool to help draft this issue,
please make sure you have reviewed and validated all content before submitting.
You are responsible for the accuracy and quality of everything in this report.
Low-quality or unreviewed AI-generated submissions may be closed without further investigation.
See our [Generative AI Contribution Policy](https://github.com/cortexproject/cortex/blob/master/GENAI_POLICY.md) for details.

**Describe the bug**

`TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning` (`pkg/compactor`) intermittently fails when a compactor does not complete a compaction run within the 60s poll window:

```
--- FAIL: TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning (81.65s)
    compactor_paritioning_test.go:1217: expected true, got false
```

**Root cause (corrected 2026-06-11):** this issue originally framed the failure as ring-convergence/timeout sizing. Investigation showed the ring is ACTIVE and stable *before* the poll starts (compactor `starting()` guarantees it), and — critically — the non-partition sibling with a **120s** poll failed in the *same* CI run (124.41s on attempt 1), so a 60→120s bump alone cannot be the fix. The dominant cost is the tests' shared testify bucket `ClientMock`: with `numUsers = 100`, every bucket operation does an O(expectations) reflective scan under a single mutex (~3,700/3,208 accumulated expectations; ~38% of CPU in `findExpectedCall`/`Arguments.Diff`), making the first compaction cycle take ~20s idle and >120s under the 6-7× starvation seen on busy CI runners. A sharpener: `syncMetasTimeout` is coupled to `CompactionInterval` (5s in these tests), so slow syncs become *failed* runs that never increment `CompactionRunsCompleted` — no poll budget can outlast that.

**To Reproduce**

Steps to reproduce the behavior:
1. Start Cortex (recent `master`)
2. Run repeatedly (flaky on starved runners):
   ```
   go test -tags "netgo slicelabels" -race -count=3 -run 'TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning|TestCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning' ./pkg/compactor/
   ```

**Expected behavior**

The first compaction cycle completes well inside the poll budget even on starved runners (workload proportional to what the test actually asserts), and the partition/non-partition twins use the same poll budget.

**Environment:**
 - Infrastructure: GitHub Actions CI, `ubuntu-24.04` (amd64), `test` job
 - Deployment tool: N/A (Go unit test)

**Additional Context**

CI evidence: run 26632776611 — attempt 1: partition twin 81.65s + non-partition sibling 124.41s (both failed; job 78485485760); attempt 2: 77.67s/122.49s. (`gh run view --job` serves the latest attempt; attempt-1 logs via the jobs API.)

Fix proposed in #7617: `numUsers` 100→20 in both twins (removes the O(N²) mock cost), 60s→120s alignment as a seatbelt, and `WaitActiveInstanceTimeout` per the #7503 pattern. Note: the *shuffle-sharding* twins are intentionally not covered — the same run's arm64 job failed one of them with a different signature (`group ... owned by multiple compactors`, an ownership-exclusivity race) that deserves its own issue.

_Filed and later corrected from CI failure-log analysis with AI assistance; run links, CPU-profile findings, and cited code paths were reviewed and verified against `master` before submitting._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test: TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning — 60s CompactionRunsCompleted poll times out #7607

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flaky test: TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning — 60s CompactionRunsCompleted poll times out #7607

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions