Skip to content

fix: guard executor shutdown in BaseCaptureStrategy.stop()#5627

Open
tsushanth wants to merge 2 commits into
getsentry:mainfrom
tsushanth:fix-persisting-executor-leak-5564
Open

fix: guard executor shutdown in BaseCaptureStrategy.stop()#5627
tsushanth wants to merge 2 commits into
getsentry:mainfrom
tsushanth:fix-persisting-executor-leak-5564

Conversation

@tsushanth

Copy link
Copy Markdown
Contributor

Fixes #5564

Problem

Each ReplayIntegration.start() constructs a fresh SessionCaptureStrategy or BufferCaptureStrategy. Both inherit BaseCaptureStrategy, which owns a persistingExecutor that was previously declared as a Kotlin lazy delegate.

stop() resets the delegated properties segmentTimestamp and currentReplayId. Their setters call runInBackground, which — when options.threadChecker.isMainThread() is true — accesses persistingExecutor, silently initialising the lazy. stop() never shuts the executor down, so one SentryReplayPersister-* thread is abandoned on every cycle. Kiln benchmark data confirmed monotonically growing thread counts and a GC allocation rate roughly 9× baseline after the repeated start/stop pattern introduced in kiln#73.

Fix

Replace the lazy delegate with an explicit nullable holder (persistingExecutorHolder). The custom get() mirrors the old lazy behaviour (create-on-first-access) while making initialisation detectable. At the end of stop(), if the holder is non-null the executor is shut down via shutdownNow() — non-blocking, safe to call on the main thread — and the holder is cleared so a subsequent start() gets a fresh executor without any residual state.

ReplayExecutorService.shutdown() is intentionally not used here: it blocks for options.shutdownTimeoutMillis and risks an ANR when invoked on the main thread. shutdownNow() interrupts any in-flight persistence write and discards the queue, which is acceptable because cache.close() is called immediately before in the same stop() body.

Test

Added stop shuts down persisting executor so no SentryReplayPersister threads leak across cycles to SessionCaptureStrategyTest. The test stubs threadChecker.isMainThread() to true (forcing the persisting executor path), runs three start/stop cycles, and asserts that no SentryReplayPersister-* threads remain alive after a short drain window.

Each start/stop cycle leaked one SentryReplayPersister-* thread because
stop() reset delegated properties (segmentTimestamp, currentReplayId)
whose setters dispatch to persistingExecutor, initialising the lazy —
but stop() never shut it down.

Replace the lazy delegate with an explicit nullable holder so the
executor is only created when actually needed and can be detected at
stop() time.  Call shutdownNow() (non-blocking) rather than the blocking
shutdown() to avoid ANRs when stop() runs on the main thread.

Fixes getsentry#5564
@runningcode

Copy link
Copy Markdown
Contributor

Thanks for contributing, could you run spotlessApply to fix the formatting failure? Then we'll review this more closely.

Comment on lines +64 to +74
private val persistingExecutor: ScheduledExecutorService
get() {
return persistingExecutorHolder
?: run {
val delegate =
Executors.newSingleThreadScheduledExecutor(
ReplayPersistingExecutorServiceThreadFactory()
)
ReplayExecutorService(delegate, options).also { persistingExecutorHolder = it }
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The custom getter for persistingExecutor is not thread-safe. Concurrent access can lead to a race condition, causing multiple executor instances to be created and resulting in a thread leak.
Severity: HIGH

Suggested Fix

Synchronize access to the persistingExecutor getter to ensure only one instance is created. This can be achieved by using a synchronized block around the check-and-create logic or by adding the @Volatile annotation to the persistingExecutorHolder field and using a double-checked locking pattern. The original lazy delegate, which is thread-safe by default, is a good model for a correct implementation.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location:
sentry-android-replay/src/main/java/io/sentry/android/replay/capture/BaseCaptureStrategy.kt#L63-L74

Potential issue: The custom getter for the `persistingExecutor` property implements an
unsynchronized lazy initialization pattern. Multiple threads, including the main thread
and background `replayExecutor` threads, can access this getter concurrently. If two
threads access the getter when `persistingExecutorHolder` is `null`, both may proceed to
create a new `ScheduledExecutorService` instance. Due to the race condition, only one
instance will be stored in `persistingExecutorHolder`, while the other will be abandoned
without being shut down. This results in a thread leak, which can degrade application
performance and lead to resource exhaustion over time. The backing field
`persistingExecutorHolder` also lacks a `@Volatile` annotation, creating potential
memory visibility issues.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix leaked persisting executor in BaseCaptureStrategy

2 participants