HTTP client API
Tobias Thierer
tobiast at google.com
Mon Nov 21 11:25:57 UTC 2016
Hi Michael -
Some follow-up/updates on items from my previous email inline below:
On Mon, Oct 31, 2016 at 6:13 PM, Tobias Thierer <tobiast at google.com> wrote:
>
> On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon <
> michael.x.mcmahon at oracle.com> wrote:
>
>>
>> The intention behind those interfaces is primarily to provide a
>> conversion between ByteBuffers and higher level objects that users
>> are interested in. So, HttpRequest.BodyProcessor generates ByteBuffers
>> for output and HttpResponse.ByteBuffer consumes them
>> on input. For response bodies, there is this additional implicit
>> requirement of needing to do something with the incoming data
>> and that is why the option of discarding it exists (the details of that
>> API are certainly open to discussion)
>>
>> On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we wanted
>> a non blocking API
>>
>
> I can understand the desire to have a non blocking API, especially if
> performance is seen as more important for the API than usability. Cronet
> (the Chromium networking stack) API [documentation
> <https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>]
> has a
>
> UrlRequest.read(ByteBuffer) <-> callback.onReadCompleted(UrlRequest,
> UrlResponseInfo, ByteBuffer)
>
> call cycle that is not too different from OpenJDK 9 HTTP client's
>
> subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)
>
> so the general idea is shared by others (I'm meeting with some Cronet
> folks later this week so I hope to gather their input / views soon).
>
I've meanwhile met with Cronet folks and now agree with your view that an
async API is useful for applications that prioritize performance over
simplicity. That said, I still think that many applications will prefer the
simplicity of a blocking API - either provided as part of the platform or
as a separate library.
FYI as an experiment to see how easy this could be, I've prototyped a
blocking API on top of the async API, specifically an adapter from OpenJDK
9's HttpResponse.BodyProcessor to an InputStream.
The way I've prototyped it was to use a PipedInputStream internally whose
buffer size is 32KBytes (2 * the size of the ByteBuffers used to deliver
the response); the number of bytes free in that internal buffer
(a) decreases when onResponseBodyChunk() copies bytes into the
associated PipedOutputStream,
(b) increases when the application reads bytes from the PipedInputStream.
The PipedInputStream is wrapped in another InputStream that calls
flowController.accept(1) every time (b) frees up enough bytes in the
internal buffer to fit everything that may be delivered in the next call to
onResponseBodyChunk(); this ensures that onResponseBodyChunk() never
blocks. The InputStream wrapper methods also take care of throwing
IOException after onResponseError(Throwable) occurred.
All of this seemed fairly straightforward, but I did run into some problems:
- I haven't succeeded in building OpenJDK 9 from source, following the
instructions (I always run into a NullPointerException in XMLDTDProcessor
during the BUILD_TOOLS_CORBA part of "make all"), so I've been unable to
empirically try / benchmark my prototype code.
- Obviously, copying data into/out of the PipedInputStream's internal
buffer is overhead; this consciously trades performance for simplicity of
the API. (I note that ResponseContent.copyBuffer() also performs copying
but I haven't checked if this can be avoided).
- flowController.accept(1) is only called when the underlying
PipedInputStream.read() returns, and then there will still be additional
latency before more data is delivered to onResponseBodyChunk(). These
delays will impair performance more than should ideally be required but I
haven't been able to quantify this through benchmarks because of my
problems of building OpenJDK 9 from source.
- Further contributing to this latency is the fact that the wrapper
doesn't have good control over how many bytes will be delivered
during the
next onResponseBodyChunk(). Note that I've run into this same
limitation on
my previous prototype for the use case of parsing image data.
Further note
that Cronet's API *does* give the application control over how many
bytes at most will be delivered during the next
onReadCompleted(UrlRequest,
UrlResponseInfo, ByteBuffer) - see my previous email. I
previously thought
that it was the BodyProcessor's job to construct the ByteBuffer
objects but
I may be mistaken - in the latest version of the API, this seems to be an
implementation detail of HttpClientImpl and not part of the public API.
On the question of push versus pull, there seems to be differing
>> philosophies around this, but
>> actually the publish/subscribe model provided by the Flow types has
>> characteristics of both push and pull.
>> It is primarily push() as in Subscriber.onNext() delivers the next buffer
>> to the subscriber. But, the subscriber
>> can effectively control it through the Subscription object.
>>
>
> What does "effectively control" means? As far as I can see, the subscriber
> can stop calls temporarily by not calling request(), but e.g. it can not
> request a minimum number of bytes to deliver (if available before EOF) on
> the next call. Is that correct or am I missing something? (See above
> question about whether ByteBuffers are completely filled).
>
I'm still interested in the answer to this question :)
2.) HttpHeaders: I love that there is a type for this abstraction! But:
>>
>> -
>>
>> Why is the type an interface rather than a concrete, final class?
>> Since this is a pure value type, there doesn’t seem to be much point in
>> allowing alternative implementations?
>>
>> You could conceive of different implementations, with different tradeoffs
>> between parsing early or lazily for instance.
>>
>
> What parsing are you referring to?
>
FYI I've meanwhile investigated in more detail whether lazy parsing might
be beneficial as part of HPACK decoding but I don't see how it could be:
- A HTTP 2 implementation MUST (RFC 7540 section 4.3
<https://tools.ietf.org/html/rfc7540#section-4.3>) sufficiently process
of header frames to update the decompression state and to signal a
COMPRESSION_ERROR in the case of an error. I take this to mean that at
least the HPACK decoder's Dynamic Table of header entries needs to be kept
up to date.
- In the HPACK decoder, decoding String literals involves negligible
work beyond figuring out where they start/end so there doesn't seem to be a
potential to do work lazily.
- Not even the Huffman decoding seems to offer significant opportunity
for lazy work because HPACK requires that "*A Huffman-encoded string
literal containing the EOS symbol MUST be treated as a decoding error*" (RFC
7541 section 5.2 <https://tools.ietf.org/html/rfc7541#section-5.2>).
Because Huffman codes are generally not self-synchronizing
<https://en.wikipedia.org/wiki/Self-synchronizing_code>, the only way to
figure out on what bit the last symbol starts seems to be to decode the
entire sequence.
What am I missing? I still don't see why HttpHeaders should be (or at least
include) an immutable, final value type class. In general, do you disagree
that ease of providing an alternative HttpClient implementation is
important?
Tobias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/net-dev/attachments/20161121/4d7e813a/attachment-0001.html>
More information about the net-dev
mailing list