HTTP client API

Tue Nov 29 15:47:00 UTC 2016

Hi Tobias,

Apologies for the long delay in responding. I'll hopefully catch up on 
your subsequent comments soon.
Just to give an overall update on the status of this work first. There 
is a plan afoot to make the JDK 9
module system support something we are calling "incubator modules". 
These are modules that
can be delivered with the JDK, but are not standard parts of the 
platform (Java SE). So, they do not
resolve by default and will have runtime/compile time warnings to 
indicate that the contained APIs
are potentially subject to change. The plan is to make this new http 
client API one of these
incubator modules in JDK 9. It will use a different package namespace to 
reflect this.

The upshot is that while we would like the API to be as complete as 
possible, it will not
be fixed or standardized in JDK 9. But by delivering the API and 
implementation with JDK 9, it
will get some exposure and use, so that we can standardize it in JDK 10, 
and move it from
its incubator module and namespace back into Java SE in the 
java.net.http package.

I've added some answers to your questions below, and will get to the 
followup messages soon.

Thanks,
Michael

On 31/10/2016, 18:13, Tobias Thierer wrote:
> Hi Michael -
>
> thanks a lot for your response! Further comments below.
>
> How would you rank simplicity, versatility and performance as goals 
> for the HTTP Client API? For example the blocking vs. non-blocking API 
> choices will largely depend on trade-offs, so I'd like to understand 
> the target applications and requirements that these trade-offs were 
> based on.
>
To my mind, we should support both blocking and non-blocking in the API, 
and by specifying them both in
the API (rather than relying on blocking behavior being a wrapper around 
non-blocking) then we get the
potential benefits of both (reduced thread switching with blocking, 
versus less thread usage with non-blocking).

> On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon 
> <michael.x.mcmahon at oracle.com <mailto:michael.x.mcmahon at oracle.com>> 
> wrote:
>
>
>     The intention behind those interfaces is primarily to provide a
>     conversion between ByteBuffers and higher level objects that users
>     are interested in. So, HttpRequest.BodyProcessor generates
>     ByteBuffers for output and HttpResponse.ByteBuffer consumes them
>     on input. For response bodies, there is this additional implicit
>     requirement of needing to do something with the incoming data
>     and that is why the option of discarding it exists (the details of
>     that API are certainly open to discussion)
>
>     On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we
>     wanted a non blocking API
>
>
> I can understand the desire to have a non blocking API, especially if 
> performance is seen as more important for the API than usability. 
> Cronet (the Chromium networking stack) API [documentation 
> <https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>] 
> has a
>
>     UrlRequest.read(ByteBuffer) <-> 
> callback.onReadCompleted(UrlRequest, UrlResponseInfo, ByteBuffer)
>
> call cycle that is not too different from OpenJDK 9 HTTP client's
>
>    subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)
>
> so the general idea is shared by others (I'm meeting with some Cronet 
> folks later this week so I hope to gather their input / views soon). 
> Note that in Cronet, the ByteBuffer is provided as part of each call 
> requesting more data, whereas in OpenJDK 9 it's returned once 
> BodyProcessor.getBuffer().
>
On this point, I'd say that there is a high degree of usability with the 
proposed API when using
the standard handlers/processors. Again, the blocking variant is very 
easy to use. The non-blocking
variant with CompletableFuture is not much harder, once the programmer 
invests some effort in understanding that
model.

Implementing your own processor does require familiarity with the Flow 
types and their underlying philosophy, but I see
those cases as outside the core use cases. It's likely that additional 
processor/handlers would be added in time also.

> At the same time, the first and third goal mentioned in JEP 110 
> <http://openjdk.java.net/jeps/110> are
>
>   * /Must be easy to use for common cases, including a simple blocking
>     mode./
>   * /A simple and concise API which caters for 80-90% of application
>     needs.
>     /
>
> Blocking, pull-based APIs tend to be simpler to use because (a) one 
> can just step through a for loop in a debugger rather than needing 
> breakpoints, and (b) state can be held on the stack rather than in 
> fields [see related blog post 
> <http://code-o-matic.blogspot.co.uk/2011/01/note-on-api-design-call-stack-as-source.html>], 
> so I'm struggling to follow the trade-offs here. Obviously, blocking 
> APIs have some disadvantages as well - e.g. context switching overhead 
> in the case of separate Threads per request. Is there any place other 
> than JEP 110 where I could read up on principles or target 
> applications that drove the trade-offs for this API?
>
But, I think you are talking about debugging the processor or handler 
there rather than the calling application code?

> A concrete example that I tried to prototype was reading an image file 
> header & body from the network, potentially aborting the request 
> depending on some metadata in the file header; using the OpenJDK 9 
> HTTP client API, this involved a ton of boilerplate. Part of the 
> boilerplate was because I wasn't sure if onNext() might get a 
> partially filled ByteBuffer that might cut off the image file header 
> or even an individual int32 value in that header. Is it guaranteed 
> that the ByteBuffer returned by BodyProcessor.getBuffer will be full 
> on each call to onNext(ByteBuffer), except possibly the last? If not, 
> perhaps it should be guaranteed?
>
> Related to this, it appears that the documentation for 
> Subscription.request(long n) 
> <http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscription.html#request-long-> specifies 
> that /n/ specifically /denotes the number of future calls/ to onNext() 
> <http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscriber.html#onNext-T->. 
> If this wasn't the case, BodyProcessor could have defined /n/ to refer 
> to a number of bytes (delivered as a single onNext() call if they fit 
> into the supplied ByteBuffer), which might have been a really neat way 
> for the application to control the sizes of chunks that it gets 
> passed. For example, an application might want to get the file header 
> in one go, and then afterwards process some fixed size data record for 
> each call. Oh well. Do you happen to have an idea for how the 
> application could be more in control of the chunk sizes?
Yes, Subscriptions relate to the number of <T> items that can be 
transferred rather than the number of bytes. There was some discussion 
on that exact question and we need to tighten that up, so that it is 
easier to map from the byte oriented flow control scheme used by HTTP/2 
to the item oriented model that Flow uses. I think basically that 
BodyProcessor needs to commit to specific buffer sizes, and guarantee 
that all calls of onNext() will transfer that exact number of bytes in 
the ByteBuffer (except for the final call of a subscription). With that 
guarantee, it is possible to have a seamless mapping between the two 
flow control schemes. Whether the processor decides the buffer size or 
the http library, I'm not certain, but I think it probably needs to be 
fixed at the start of a particular request/response, rather than be 
allowed to vary.

I'd be interested to look at your prototype to see what other pain 
points there might be.

>     and had originally defined
>     something that was quite similar to the asynchronous model used by
>     Flow, but the Flow types have a nice mechanism for
>     managing flow control asynchronously. So, it made sense to use the
>     standard API.
>
>
> Regarding managing flow control, I assume you're referring to 
> subscription.request() which I assume is tied to WINDOW_UPDATE frames 
> in HTTP 2? Also, I assume that Subscription.cancel() will result in a 
> RST_STREAM frame being sent to the server, is that correct?
>
Yes.
> Is there a recommended way for an application to limit stream 
> concurrency (RFC 7540 section 5.1.2 
> <https://tools.ietf.org/html/rfc7540#section-5.1.2>)? Example use 
> case: An application that loads and decodes .jpeg images from the 
> network might be limited by how many decompressed images it can hold 
> in memory at a time (the application may want to re-compress the 
> images in a different format, e.g. for file format conversion or to 
> send them to the GPU in ETC2 texture compression format). Would such 
> an application manually maintain a limited set of "active" streams, 
> making sure to start calling request() only for those streams that 
> have become active? (With a blocking API, an application could simply 
> use a fixed-size thread pool, or could use a Semaphore to guard access 
> to a limited resource, such as memory).
>
I really wouldn't use the request/response processors for anything other 
than encoding/decoding of bodies. There are a variety of ways of 
limiting request/response concurrency. As you suggest, it is obvious for 
the blocking style. For asynchronous code using CompletableFuture there 
are nice ways to do it also. In one example I wrote a small class called 
RequestLimiter specified as follows:
(it is kind of like an asynchronous version of a semaphore)

// Asynchronous semaphore. Implementation just uses a LinkedList<CompletableFuture<Void>>
// Very simple
class RequestLimiter {
     RequestLimiter(int concurrency);

     // returned CF completes when caller is permitted to send request.
     // 'concurrency' requests are allowed simultaneously
     synchronized CompletableFuture<Void>  obtainPermit();

     // signals that a response has been received and processed
     // permitting another request to start
     synchronized void returnPermit();
}

// The calling code might look as follows

{
     HttpClient client = ... // build the client
     int REQUESTS = 1000; // say
     int CONCURRENCY = 10; // say
     RequestLimiter limiter = new RequestLimiter(CONCURRENCY);

     CompletableFuture<Path>  futures[] = new CompletableFuture[REQUESTS];

     for (int i=0; i<REQUESTS; i++) {
         HttpRequest request = .... // build the request
         Path path = Paths.get("/someroot", request.uri().getPath());
         futures[i] = limiter
             .obtainPermit()
             .thenCompose((v)->  client.sendAsync(request, ResponseBodyHandler.asFile(path)))
             .thenApply((HttpResponse<Path>  response) ->  {
                 limiter.returnPermit();
                 return response.body();
             })
     }
     // wait for all to complete
     CompletableFuture.allOf(futures).join();

>     We're certainly open to suggestions for improved names, but I
>     think it is useful to have a way to explicitly discard a response
>     body, especially since HTTP/2 has an efficient way to do that.
>
>
> Why can't the response body instead be discarded by calling 
> subscription.cancel() during the call to onSubscribe() (before the 
> first ByteBuffer of the body data has been delivered)?
>
That would have a similar effect, except that the subscription wouldn't 
normally be visible to the user of the response processor,
but some processor implementations could expose that capability if it 
makes sense. For instance, a processor which returns
the response body as an InputStream would map InputStream.close() to 
subscription.cancel()

>     On the question of push versus pull, there seems to be differing
>     philosophies around this, but
>     actually the publish/subscribe model provided by the Flow types
>     has characteristics of both push and pull.
>     It is primarily push() as in Subscriber.onNext() delivers the next
>     buffer to the subscriber. But, the subscriber
>     can effectively control it through the Subscription object.
>
>
> What does "effectively control" means? As far as I can see, the 
> subscriber can stop calls temporarily by not calling request(), but 
> e.g. it can not request a minimum number of bytes to deliver (if 
> available before EOF) on the next call. Is that correct or am I 
> missing something? (See above question about whether ByteBuffers are 
> completely filled).
>
Yes, I mean by regulating the calls to request() you control the rate of 
data flow and can stop it temporarily. I think
the question you raised above about minimum numbers of bytes can be 
addressed as discussed above.

>     There are other benefits to having the body available
>     synchronously with the rest of the response object
>     and it is still possible to implement arbitrary streaming of
>     response body data without having to buffer it.
>     A while ago I prototyped a response processor which delivers the
>     body through an InputStream.
>
>
> What's the reason that implementation got dropped? FWIW, I suspect 
> that a lot of applications will be tempted to use 
> HttpResponse.asByteArray().getBody(); if this happens, it'll be 
> strictly worse than a potential HttpResponse.asInputStream() because
>
>   * it holds the entire response body in memory as a byte[], and
>   * processing only starts after the last byte of the response is
>     received from the network (this means that the request can't be
>     aborted part way through)
>

But, if you know in advance that the response body will be small then I 
don't see the problem with reading it all in
to a byte[] or String, or some other application defined type. If there 
is a possibility that response bodies might be very
large or unbounded, then it would be better to use a response processor 
that streams the data,

>>     2.) HttpHeaders: I love that there is a type for this
>>     abstraction! But:
>>
>>      *
>>
>>         Why is the type an interface rather than a concrete, final
>>         class? Since this is a pure value type, there doesn’t seem to
>>         be much point in allowing alternative implementations?
>>
>     You could conceive of different implementations, with different
>     tradeoffs between parsing early or lazily for instance.
>
>
> What parsing are you referring to? I don't see any parsing logic in 
> HttpHeadersImpl other than in firstValueAsLong. It does however have 
> logic that alternative implementations could get wrong / be 
> inconsistent with, such as
>
>   * whether to use case insensitive comparison for header names
>   * whether the Map returned from map() is unmodifiable
>   * whether the List returned by allValues() is unmodifiable (it is
>     modifiable in the current implementation, but I think that's a bug)
>   * the behavior of firstValueAsLong when the value exists but is not
>     parseable as a long
>
> Have you considered making HttpHeaders immutable and adding a Builder 
> implementation? That would make it thread safe at a small cost to the 
> semantic weight of the API.
>
With HTTP/1 you can read the headers in as a blob of bytes and then just 
go looking for headers on demand, or you can parse
the entire header block up front.  We should certainly make sure that we 
don't preclude doing what you suggest here
that HttpHeaders might be buildable and an immutable class.

>>      *
>>
>>         Dothe methods other than map() really pull their weight
>>
>>      *
>>
>>         (provide enough value relative to the semantic API weight
>>         that they introduce)?
>>
>>          o
>>
>>             firstValueAsLong() looks particularly suspect: why would
>>             anyone care particularly about long values?
>>             Especiallysince the current implementation seems to throw
>>             NumberFormatException rather than returning an empty
>>             Optional?
>>
>>
>     Numeric header values would be fairly common I think. Long was
>     just chosen as the widest possible integer type.
>     NumberFormatException is only thrown if the value is present but
>     can't be parsed as a number.
>
>
> But then you'd have to document that NFE is thrown, which an 
> application would have been much less likely to miss if they had 
> called Long.valueOf() themselves. It's also not clear that throwing 
> NFE, rather than returning an empty Optional, is the behavior that 
> most applications prefer.
>
NFE is an unchecked exception. So, I don't think there is much overhead. 
If you are reading a header that is supposed to be
a numeric value I think NFE is reasonable. That is effectively a 
protocol error.
> These methods seem to add semantic weight to the API for very little 
> benefit. Perhaps it'd be better to leave them out, but add them in a 
> future revision of the API if it turns out that a lot of applications 
> duplicate exactly this logic?
>
>>      *
>>
>>         I haven’t made up my mind about whether the existing choices
>>         are the right ones / sufficient. Perhaps if this class used
>>         the typesafe enum pattern from Effective Java 1st edition
>>         rather than being an actual enum, the API would maintain the
>>         option in a future version to allow client-configured
>>         Redirect policies, allowing Redirect for URLs as long as they
>>         are within the same host/domain?
>>
>>
>     As Anthony mentioned, there was a more complicated API originally,
>     but it was felt to be too complicated, which
>     was why we settled on enums (which can be extended in future).
>
>
> The choice of enums limits the versatility because all enum instances 
> need to be known at compile time; for example, it'd be impossible to 
> provide a Redirect policy that allows redirects to a given set of hosts.
>
> The only benefit I can see from using specifically an enum class is 
> that this gives us a serialization format for free. But a conversion 
> to/from String values for the standard policy instances (via a 
> valueOf(String) method or similar) would offer the same at a fairly 
> small cost of semantic weight.
>
> For example, doesn't the existence of the convenience method 
> HttpHeaders.firstValueAsLong() have a much smaller benefit, but 
> similar semantic weight, to e.g. a potential non-enum 
> Redirect.valueOf() method (and even so, that method could be left to a 
> future API version)?
>
>>     4.) HttpClient.Version:
>>
>>      *
>>
>>         Why does a HttpClient need to commit to using one HTTP
>>         version or the other? What if an application wants to use
>>         HTTP/2 for those servers that support it, but fall back to
>>         HTTP/1.1 for those that don’t?
>>
>>
>     Answered.
>
>
> Given that HttpClientBuilder.version(Version) does not actually 
> restrict the client to only that version, should it perhaps be called 
> setMaximumVersion(Version), or changed to enableVersion(Version) / 
> disableVersion(Version), or similar?
Perhaps, requestedVersion() ?
>
>>     5.) CookieManager
>>
>>      *
>>
>>         Is there a common interface we could add without making the
>>         API much more complex to allow us both RFC 2965 (outdated,
>>         implemented by CookieManager) and RFC 6265 (new, real world,
>>         actually used) cookies? Needs prototyping. I think it’s
>>         likely we’ll be able to do something similar to OkHttp’s
>>         CookieJar
>>         <https://square.github.io/okhttp/3.x/okhttp/okhttp3/CookieJar.html>which
>>         can be adapted to RFC 2965 - not 100%, but close enough that
>>         most implementations of CookieManager could be reused by the
>>         new HTTP API, while still taking advantage of RFC 6265 cookies.
>>
>>
>     One suggestion has been to use the low-level CookieHandler API.
>     But, another option would be just to
>     evolve CookieManager to the latest RFCs, which sounds a lot like
>     what you are suggesting.
>
>
> Not exactly what I was suggesting, but similar and might work. For 
> what it's worth, even CookieHandler mentions "Cookie2" headers in the 
> documentation so this would need to be changed and the backwards 
> compatibility risk would need to be estimated. For reference, RFC 7540 
> (HTTP 2) normative [COOKIE] reference is to the (new) cookie RFC 6265 
> which no longer has Cookie2 / Set-Cookie2 headers.
>
> Tobias
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/net-dev/attachments/20161129/ce0d5af6/attachment-0001.html>