Skip to content

HDDS-15521. StreamBlockInputStream fails with TimeoutIOException without retry or failover#10479

Draft
sadanand48 wants to merge 2 commits into
apache:masterfrom
sadanand48:HDDS-15521
Draft

HDDS-15521. StreamBlockInputStream fails with TimeoutIOException without retry or failover#10479
sadanand48 wants to merge 2 commits into
apache:masterfrom
sadanand48:HDDS-15521

Conversation

@sadanand48

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

StreamBlockInputStream: Added a retry loop in dataAvailableToRead() that catches read failures (including TimeoutIOException) and calls handleExceptions() to release the stream, refresh pipeline/token, and reset requestedLength = position so reads resume from the current offset. On retry, failed datanodes are recorded in a per-stream failedStreamingDatanodes set and excluded from the next init.

XceiverClientGrpc: Extended initStreamRead() to accept an excluded-datanode set and skip those nodes when opening a new streaming stream, so retries target a different replica instead of rebinding to the same dead one. Added INFO logging to show which datanode each init selects.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15521

How was this patch tested?

Mini-Ozone cluster test added

@adoroszlai adoroszlai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sadanand48 for the patch. To reduce overhead of cluster startup, can you please rewrite test cases to share a single cluster?

@adoroszlai adoroszlai changed the title HDDS-15521. StreamBlockInputStream fails with TimeoutIOException without retry or datanode failover. HDDS-15521. StreamBlockInputStream fails with TimeoutIOException without retry or failover Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants