Skip to content

Stale dashboard: #469 synchronize→refresh.pull can wedge the pull queue (board freezes until manual refresh) #470

Description

@ianrohde

Summary

PRs #463 and #469 (GitHub approval → CR signoff) introduced a regression path that can leave the dashboard stale even after a hard refresh — matching the "Pulldasher is very out of date" report where merged PRs lingered until each was manually refreshed.

The merge/close logic itself is not broken. The damage comes from #469 routing every synchronize webhook through a full refresh.pull, which awaits a serialized queue whose error handling can wedge. #463 audited clean.

How a merged PR gets stuck on the board

Client pushes go through pull-queue.js. markPullAsDirty only emits when not paused; while paused it silently stashes into dirtyPulls (lib/pull-queue.js:17-24).

updateAllPullData pauses the queue and resumes only on success:

// lib/db-manager.js:39-86
queue.pause();
return dbManager.updatePull(pull).then(function () {
  ...
  return Promise.all(updates).then(function () {
    queue.markPullAsDirty(...);
    queue.resume();   // never runs if any update rejects
  });
});

Any rejected DB write leaves the queue paused forever → all later markPullAsDirty (merges/closes included) are stashed and never emitted. The board freezes.

  • Stale after force reload: reload re-sends the in-memory pulls array (lib/pull-manager.js:41-47), which only updates via the pullsChanged event. Paused → array never updates → hard refresh re-serves stale state.
  • Manual per-card refresh cleared the backlog: the refresh button → socket refreshrefresh.pullupdateAllPullData. The first one that succeeds calls queue.resume(), flushing all accumulated dirtyPulls at once.

This pause/resume asymmetry is a pre-existing latent bug (dates to 2016, 74c1838). #469 made it live by sending every synchronize (every push to every open PR) through updateAllPullData. Before #469 synchronize did only a cheap updatePull — it never paused the queue.

Regression inventory (all from #469's synchronizerefresh.pull)

  1. Board freeze (queue-pause wedge). db-manager.js:81-84queue.resume() only on success. Amplified from rare to per-push.
  2. Refresh queue stall + webhook hang. lib/refresh.js:122-128 ends in a bare .done(onFulfilled) with no reject handler. If parse/updateAllPullData rejects, next() never runs → pullQueue stalls → the resolve thunk is never popped → refresh.pull never settles. Since Honor GitHub approvals carried over a clean master merge #469 awaits it in the webhook chain (controllers/githubHooks.js:99), the handler hangs with no HTTP response → GitHub 10s timeout. All later refreshes stall too.
  3. Webhook 500 + GitHub retries. dbUpdated now rejects when refresh.pull fails → res.status(500) even though updatePull succeeded (githubHooks.js:90-104, 193-204). Pre-Honor GitHub approvals carried over a clean master merge #469 synchronize was a cheap 200. GitHub retries → more load.
  4. Latency / blocking divergence. githubHooks.js:99 is the only refresh.pull that is returned into the response chain. Lines 132/160/224/265-269 are all fire-and-forget. Synchronize alone blocks its 200 on ~8 paginated GitHub calls through a one-at-a-time queue — uniquely exposed to the 10s webhook timeout.
  5. Rate-limit pressure. Every synchronize now runs full parse (~8 paginated calls + job runs, lib/git-manager.js:153-161). More 429s feed paths 1-4.
  6. CR staleness guarantee relaxed. fromGithubApproval CRs are exempt from the commit-date invalidation (git-manager.js:211-219). The code does no diff check — it trusts GitHub's current APPROVED state. Correct only if branch protection "Dismiss stale pull request approvals when new commits are pushed" is enabled on every tracked repo; otherwise approvals are over-honored across real changes.

Verified safe (no regression)

  • Merge/close handling unchanged (closed action's .then is a no-op — reconcileAfterUpdate stays false).
  • Emoji CR/QA parsing unchanged; hasTag(body || "") is a pure no-op.
  • Signoff counting dedups by user.login (frontend/src/pull.ts:233-247) — repeat approvals can't inflate cr_req.
  • Synthetic CR added only when body has no CR tag (no double count per review).
  • dev_block/deploy_block reconstructed correctly by the now-triggered full refresh.
  • QA never synthesized from approvals; added state field inert in parseSignatures.
  • Unit tests pass (3/3, test/signature.test.js).

Proposed fixes (priority order)

  1. updateAllPullData — move queue.resume() into a .finally() / catch so it always runs. Fixes the freeze. Highest leverage.
  2. lib/refresh.js queue pop — add a reject handler that logs and still calls next(), so one bad refresh can't stall the queue or hang the webhook.
  3. Don't await refresh.pull in the synchronize handler — make it fire-and-forget like every other caller (or debounce), so the webhook response isn't coupled to a full GitHub refresh.
  4. Confirm "Dismiss stale pull request approvals" is enabled on all tracked repos, or add a commit-since-approval / diff check before honoring a carried-over approval.

Items 1-2 are pre-existing latent bugs #469 turned live; 3-4 are #469-specific.

Related: #463, #469.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions