Reactive streams and aggregate reads and writes
Mark Paluch
mpaluch at paluch.biz
Tue Jan 15 08:58:54 UTC 2019
Hi Douglas,
I’m on the same page as David.
The signal to noise ratio with Publisher<Byte> is way off as it creates too much overhead.
This leaves us with either Publisher<byte[]> or Publisher<ByteBuffer> and I think the latter is more appropriate. For char streams, Publisher<CharBuffer> seems the right fit as it mirrors the binary approach.
The underlying question is: What do we want to optimize for?
From my perspective, using BLOB/CLOB is an indicator that we want to work with a lot of data so single byte/single char isn’t the majority of use-cases and these scenarios should be produced/consumes with already existing mechanisms.
The last question is how to deal with aggregates as request(1) translates to: Request the next chunk and not really request the next byte/char. The granularity is a different one. While request(1) could translate nicely to bytes/chars, the actual meaning is the number of data items/messages.
The data size itself is unspecified. However, it should be small enough so that a single message does not overwhelm a consumer.
From typical reactive use cases, aggregates go hand in hand with a chunk/fetch size.
It is the responsibility of a producer to make sure the producer respects back pressure and does not overwhelm the consumer.
Cheers,
Mark
Am 15. Jan. 2019, 03:11 +0100 schrieb Douglas Surber <douglas.surber at oracle.com>:
> Given a reactive source/sink of bytes or chars what is the best way to subscribe/publish byte[]/String?
>
> When working with an Input/OutputStream or a Reader/Writer it is easy to define methods that read both single bytes/chars and byte[]s/Strings. But it is not obvious to me how to do the same with Flow. If one declares a Flow.Publisher<Byte> then the bytes are processed one at a time. In high throughput systems that doesn't work. It is much better to read byte[]s, preferably large ones, on each call rather than single bytes. This can make a really big difference in performance. Same with processing individual chars as opposed to Strings. On the other hand, sometimes processing only a single byte/char is the right thing.
>
> Let's consider a Flow.Publisher<byte[]>. If the subscriber requests 1 element, the Subscriber has no idea how many bytes that element will contain. It could be 1 byte or it could be 1G byte. What the Subscriber wants to do is say "give me 16k bytes" and get one or more calls to next where the total length of all the byte[]s passed to next is 16k, not 16k calls to next. This clearly violates the contract for Flow.Subscription.request.
>
> I can't be the first person to have this question, but my Google-fu is not sufficient to find an answer. I think the problem is I don't know the reactive stream terminology necessary to pose the question.
>
> The particular use case I'm working on is Blob/Clob. It would be easy enough for Blob/Clob to provide publish/subscribe methods that return Flow.Publisher<Byte/Char> and Flow.Subscriber<Byte/Char>, but that's not going to be performant. On the other hand Flow.Publisher<byte[]/String> and Flow.Subscriber<byte[]/String> don't really do the right thing.
>
> Suggestions? How do other reactive stream users address this issue?
>
> Douglas
More information about the jdbc-spec-discuss
mailing list