Possible HTTP/1.1 per-host permit leak on request-timeout / connect-success race

Related to #2176.

I think there may be a residual race in the same request-timeout / connect-success path fixed in #2176, but with a different symptom: a leaked per-host connection permit for HTTP/1.1.

## Version

Observed with:

- `org.asynchttpclient:async-http-client:3.0.10`
- HTTP/1.1
- finite `maxConnectionsPerHost`
- request timeouts enabled

I also checked current `main` and the relevant ordering appears unchanged.

## Summary

Under a timeout storm against a peer that accepts TCP/TLS connections but does not send a response or close the socket, AHC may leak a per-host semaphore permit permanently.

The suspected race is:

1. A new HTTP/1.1 connection is opened, so a per-host permit is acquired and stored on `NettyResponseFuture.partitionKeyLock`.
2. `NettyConnectListener.onSuccess(...)` starts running on the event loop.
3. `onSuccess(...)` calls `future.takePartitionKeyLock()`, moving the permit token off the future.
4. Before `future.attachChannel(channel, false)` runs, the request timeout fires.
5. `TimeoutTimerTask.expire(...)` calls:
   `requestSender.abort(nettyResponseFuture.channel(), nettyResponseFuture, TimeoutException)`
6. At this point `future.channel()` can still be `null`, because `attachChannel(...)` has not run yet.
7. `NettyRequestSender.abort(...)` therefore cannot close the newly connected channel.
8. The future is aborted and `releasePartitionKeyLock()` releases nothing, because `onSuccess(...)` already took the token.
9. `onSuccess(...)` can then attach the token to `channel.closeFuture()`.
10. If the remote peer keeps the socket established and never sends FIN/RST, `closeFuture()` never fires, so the per-host permit is never released.

Final state: the request future is done/aborted, but the HTTP/1.1 channel is live and holds a per-host permit until the socket or client/JVM is closed.

## Why #2176 may not fully cover this

#2176 correctly handles the case where `timeoutsHolder == null` or `future.isDone()` is observed by `NettyConnectListener.onSuccess(...)`.

This suspected case is narrower:

- timeout abort sees `future.channel() == null`, so it cannot close the concrete channel;
- `onSuccess(...)` has already taken the partition lock from the future;
- `onSuccess(...)` does not necessarily observe `future.isDone()` before attaching the permit to `channel.closeFuture()`;
- after that, the only release path is channel close.

So #2176 fixes an observed NPE / early-exit problem in this race family, but it may still leave a permit-accounting window.

## Source points

Relevant 3.0.10 source paths:

- `NettyConnectListener.onSuccess(...)`
  - takes the partition lock via `future.takePartitionKeyLock()`;
  - later calls `attachSemaphoreToChannelClose(...)`;
  - only after that path does `writeRequest(...)` call `future.attachChannel(channel, false)`.

- `TimeoutTimerTask.expire(...)`
  - calls `requestSender.abort(nettyResponseFuture.channel(), nettyResponseFuture, ...)`.

- `NettyRequestSender.abort(...)`
  - only closes the channel if the passed `channel` is non-null and active.

- `NettyResponseFuture.terminateAndExit(...)`
  - calls `releasePartitionKeyLock()`, but by then the token may already have been taken by `onSuccess(...)`.

- `DefaultChannelPool`
  - TTL / idle cleanup only scans pooled idle channels, not active/orphaned channels that were never returned to the pool.

## Operational symptom

In production, after a burst of request timeouts to one host, we saw sustained `TooManyConnectionsPerHostException` for that same host.

The important shape was:

- timeout burst first;
- then a flat `TooManyConnectionsPerHostException` plateau;
- no gradual decay;
- cleared only when the JVM / AHC client was recycled.

That looks like a fixed number of per-host permits were orphaned and never returned.

## Expected behavior

A request timeout racing with connect success should not be able to leave an HTTP/1.1 per-host permit owned only by a live channel that is no longer associated with an active request.

At minimum, either:

- the concrete channel should be attached/published before timeout abort can observe `future.channel() == null`;
- or timeout/connect race handling should close the just-connected channel when the future was already aborted;
- or the permit should be released if the future has completed before request ownership is fully transferred to the channel.

## Actual behavior

From the source, it looks possible for:

- timeout abort to run with `future.channel() == null`;
- `onSuccess(...)` to have already removed the permit token from the future;
- the future abort path to release nothing;
- the permit to be bound to `channel.closeFuture()`;
- the channel to remain established indefinitely if the peer never closes.

## Question

Is this interleaving possible in 3.0.10 / current `main`?

If yes, should AHC close the channel or release the per-host permit when a request timeout wins before `future.attachChannel(...)` publishes the channel?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible HTTP/1.1 per-host permit leak on request-timeout / connect-success race #2189

Version

Summary

Why #2176 may not fully cover this

Source points

Operational symptom

Expected behavior

Actual behavior

Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Possible HTTP/1.1 per-host permit leak on request-timeout / connect-success race #2189

Description

Version

Summary

Why #2176 may not fully cover this

Source points

Operational symptom

Expected behavior

Actual behavior

Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions