RFR: 8277969: HttpClient SelectorManager shuts down when custom Executor rejects a task

Wed Mar 9 12:35:04 UTC 2022

On Wed, 9 Mar 2022 10:04:57 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> These changes make sure that pending requests are terminated if the selector manager thread exits due to exceptions.
>> This includes:
>>    1. completing CompletableFutures that were returned to the caller code
>>    2. cancelling requests that are in flight
>>    3. calling onError on BodySubscribers that may not have been completed
>> Note that step 3 is necessary as certain CompletableFutures, such as those returned by `BodySubscribers.ofInputStream`, complete immediately, the operation being eventually completed when the last bite of the response is read. Completing a completable future that is already completed has no effect, this case is handled by completing the BodySubscriber too.
>
> src/java.net.http/share/classes/jdk/internal/net/http/Http1AsyncReceiver.java line 501:
> 
>> 499:             debug.log("recorded " + t + "\n\t delegate: " + delegate
>> 500:                       + "\n\t queue.isEmpty: " + queue.isEmpty()
>> 501:                       + " stopRequested: " + stopRequested, ex);
> 
> Should this also be preceded with a `\n\t ` to be consistent with the rest of this log message?

Good point - will do

> src/java.net.http/share/classes/jdk/internal/net/http/Http1Response.java line 366:
> 
>> 364:                         refCountTracker.tryRelease();
>> 365:                     }
>> 366:                 });
> 
> I see that later in this code there's another `finally` block where we have now added a `tryRelease` call which first checks the `acquired` flag. A couple of questions:
> 1. Should this above line too first check the `acquire` flag?
> 2. Should we reset the `acquire` local variable once we `tryRelease` here so as to prevent another `tryRelease` in the other `finally`  block? A short glance at the implementation of `tryRelease` suggests that multiple `tryRelease()` with a single `acquire()` won't cause a problem in its state but it does end up logging some messages, which perhaps could be avoided.

tryRelease can be called multiple time - it will only decrement the ref counting once. It could happen when different threads notice that the operation is finished (usually due to some exceptions being propagated) and try concurrently to decrease the ref counter. It is very important to decrease the ref counter (otherwise the HttpClient may never shutdown when it's released) and equally very important not to decrease it twice for the same operation. Here we don't need to check whether it's been acquired because we know it's been acquired if we reach here. IIRC what we don't want to do is call tryRelease() if it's never been acquired.

> src/java.net.http/share/classes/jdk/internal/net/http/Http1Response.java line 374:
> 
>> 372:                     cf.completeExceptionally(t);
>> 373:                 } finally {
>> 374:                     if (acquired) refCountTracker.tryRelease();
> 
> I see that before this change, we never did a `tryRelease` in this finally block. Was that a separate bug in itself and something that is being fixed as part of this PR?

I've been fighting with trying to get the ref counter back to 0 after Executor::shutdown has been called. 
Throwing TaskRejectedException (especially when `SequentialScheduler::runOrSchedule(Executor)` is called) is very disruptive to the logic.
In practice - it does not matter as much provided that all resources have been released and all body subscribers have been completed - and that's what this PR does.

> src/java.net.http/share/classes/jdk/internal/net/http/Http1Response.java line 647:
> 
>> 645:             if (t instanceof EOFException && parser != null &&
>> 646:                     parser instanceof UnknownLengthBodyParser) {
>> 647:                 ((UnknownLengthBodyParser)parser).complete();
> 
> A minor note - while we are it, perhaps the new `instanceof` feature can be used here to avoid this cast?

Good point.

> src/java.net.http/share/classes/jdk/internal/net/http/Http1Response.java line 728:
> 
>> 726:                 } else {
>> 727:                     if (debug.on()) {
>> 728:                         debug.log("no parser");
> 
> Should we perhaps log the error message (`Throwable::getMessage`) to provide some kind of hints in the logs when this message is logged? Perhaps `Ignoring error since no parser available: + error.getMessage()`?

Good idea

> src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java line 179:
> 
>> 177:             } catch (Throwable t) {
>> 178:                 errorHandler.accept(command, t);
>> 179:                 ASYNC_POOL.execute(command);
> 
> Does this mean that we will now attempt to use an `ASYNC_POOL` even if a had supplied an `Executor`? The `CompletableFuture#defaultExecutor` which is what the `ASYNC_POOL` represents states:
> 
> Returns the default Executor used for async methods that do not specify an Executor.
> 
> which would now contradict with this code. With the error handler logic in place (one line above), do you think we should just give up on running the `command` instead of passing to the default executor?
> 
> Furthermore, if we do decide to use the default executor in cases like this, is that something that all implementations of the `HttpClient` are expected to do (i.e. would it be a specification) or is this more an implementation detail.

1. Dependent actions added by the caller to the CF returned by the HttpClient are executed in the FJP
2. This is an exceptional case - we're doing this only when we're shutting down the HttpClient. I don't think we should document it - this is an implementation detail. But we should probably document that shutting down the executor while operations are still in progress can lead to unpredictable/unspecified behavior.

Note: If we had VirtualThreads we would probably create a new VirtualThread here, rather than using the FJP.

> src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java line 199:
> 
>> 197:     // the selector manager thread exists abnormally.
>> 198:     // The request, its id, and its completable future, are stored in a record-like
>> 199:     // PendingRequest object added to the pending requests set (pendingRequests).
> 
> I'm not too familiar with the HttpClient code yet. Before this change, was there ever a place where we would end up holding onto the request instances in memory and prevent them from being garbage collected? Would this change cause unintentional resource constraints in a regular/good scenario (where the `SelectorManager` thread hasn't exited abnormally and is still alive) or in cases where HTTP responses are slow to come back from servers?

In the normal course of action the HttpClient will not shutdown if any operation are still in progress. So holding onto the multiexchange here doesn't prevent the client from shutting down, since the multiexchange will be remove from this list when the operation completes.

> src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java line 302:
> 
>> 300:             var msg = reason instanceof RejectedExecutionException
>> 301:                     ? reason.getClass() : reason;
>> 302:             client.debug.log("aborting pending requests due to: %s", msg);
> 
> Any reason why we are using a different log message for `RejectedExecutionException`? Would it instead be useful to let the underlying `message` from the `RejectedExecutionException` be logged too just like we are doing for other exception types here?

IIRC the message was too long and not very informative.

> src/java.net.http/share/classes/jdk/internal/net/http/HttpClientImpl.java line 308:
> 
>> 306:         while (!pendingRequests.isEmpty()) {
>> 307:             var pendings = pendingRequests.iterator();
>> 308:             while (pendings.hasNext()) {
> 
> Would there be a case where while we are aborting these current pending requests, some user code does a  `sendAsync` and we end up add a new pending request to the `Set`? Would we then end up holding those new requests in the `Set` never to be cleared again because, I guess, `abortPendingRequests` won't be called again?

I think that's been taken care of. The first thing we do is set up a flag that we're shutting down. I remember asking myself the same question and making sure it doesn't happen - that is - everything will be aborted.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7196