Skip to content

fix: remove pthread_mutex that causes self-deadlock on VLE queries#2433

Merged
jrgemignani merged 1 commit into
apache:masterfrom
crdv7:fix/remove-pthread-mutex-self-deadlock
Jun 9, 2026
Merged

fix: remove pthread_mutex that causes self-deadlock on VLE queries#2433
jrgemignani merged 1 commit into
apache:masterfrom
crdv7:fix/remove-pthread-mutex-self-deadlock

Conversation

@crdv7

@crdv7 crdv7 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Remove pthread_mutex from age_global_graph.c — it causes permanent self-deadlock on VLE queries.

ereport(ERROR) (statement timeout, query cancellation, OOM, etc.) triggers siglongjmp which skips pthread_mutex_unlock(). The mutex remains locked, and every subsequent VLE query on the same backend hangs forever in pthread_mutex_lock() with __owner == own PID.

Closes #2432

Analysis

The mutex was introduced in PR #1881 (fix for #1878). It is both unnecessary and harmful:

  1. Unnecessary: The protected variable is a process-local static — no concurrent access exists. The test failure in Flaky test age_global_graph fails on slow machines #1878 was a catalog-level race, already fixed by the Assert→runtime check and strndup in the same PR. For cross-backend cache invalidation, PostgreSQL syscache uses sinval callbacks, and PR VLE cache: replace snapshot invalidation with per-graph #2376 already uses lock-free pg_atomic_uint64 version counters in shared memory for this.

  2. Harmful: pthread_mutex is incompatible with PostgreSQL's ereport(ERROR) / siglongjmp error handling. Any ERROR during the mutex-held window permanently locks it. This is the only use of pthread in the entire AGE codebase.

Changes

src/backend/utils/adt/age_global_graph.c (1 file, +13 −50):

  • Remove #include <pthread.h>
  • Remove GRAPH_global_context_container struct (with pthread_mutex_t field)
  • Revert to static GRAPH_global_context *global_graph_contexts = NULL
  • Remove all pthread_mutex_lock/unlock calls (20 references)

The other fixes from PR #1881 (Assert→runtime check + strndup) remain untouched.

Verification

  • All regression tests pass (36/36)
  • Deadlock reproduction via statement_timeout: no longer hangs after fix

The pthread_mutex in manage_GRAPH_global_contexts() causes permanent
self-deadlock when ereport(ERROR) triggers siglongjmp while the mutex
is held, skipping pthread_mutex_unlock(). Any subsequent VLE query on
the same backend connection hangs forever in pthread_mutex_lock() with
__owner == own PID.

The mutex was introduced in PR apache#1881 (fix for issue apache#1878) but is
unnecessary: it protects a process-local static variable in
PostgreSQL's single-threaded backend model where no concurrent access
exists. The actual fix for apache#1878 was the Assert-to-runtime-check
conversion and strndup defensive copy, which remain untouched.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the pthread_mutex-based locking around the per-backend global graph-context list in age_global_graph.c to prevent permanent self-deadlock when PostgreSQL errors (ereport(ERROR) via siglongjmp, e.g., statement timeout/cancel/OOM) occur while the mutex is held during VLE-related cache operations.

Changes:

  • Removed #include <pthread.h> and the mutex-containing GRAPH_global_context_container.
  • Reverted the per-process context head back to a simple static GRAPH_global_context *global_graph_contexts.
  • Removed all pthread_mutex_lock() / pthread_mutex_unlock() calls from global-context management and lookup paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/utils/adt/age_global_graph.c
@jrgemignani

jrgemignani commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@crdv7 Please see the above Copilot message. edit nvm, this is pre-existing and needs to be addressed separately.

@jrgemignani jrgemignani merged commit 23cbe57 into apache:master Jun 9, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pthread_mutex in manage_GRAPH_global_contexts causes permanent self-deadlock on VLE queries

3 participants