Reactive streams and aggregate reads and writes

Tue Jan 15 01:58:15 UTC 2019

Given a reactive source/sink of bytes or chars what is the best way to subscribe/publish byte[]/String?

When working with an Input/OutputStream or a Reader/Writer it is easy to define methods that read both single bytes/chars and byte[]s/Strings. But it is not obvious to me how to do the same with Flow. If one declares a Flow.Publisher<Byte> then the bytes are processed one at a time. In high throughput systems that doesn't work. It is much better to read byte[]s, preferably large ones, on each call rather than single bytes. This can make a really big difference in performance. Same with processing individual chars as opposed to Strings. On the other hand, sometimes processing only a single byte/char is the right thing.

Let's consider a Flow.Publisher<byte[]>. If the subscriber requests 1 element, the Subscriber has no idea how many bytes that element will contain. It could be 1 byte or it could be 1G byte. What the Subscriber wants to do is say "give me 16k bytes" and get one or more calls to next where the total length of all the byte[]s passed to next is 16k, not 16k calls to next. This clearly violates the contract for Flow.Subscription.request.

I can't be the first person to have this question, but my Google-fu is not sufficient to find an answer. I think the problem is I don't know the reactive stream terminology necessary to pose the question.

The particular use case I'm working on is Blob/Clob. It would be easy enough for Blob/Clob to provide publish/subscribe methods that return Flow.Publisher<Byte/Char> and Flow.Subscriber<Byte/Char>, but that's not going to be performant. On the other hand Flow.Publisher<byte[]/String> and Flow.Subscriber<byte[]/String> don't really do the right thing.

Suggestions? How do other reactive stream users address this issue?

Douglas