Reactive streams and aggregate reads and writes

Tue Jan 15 09:44:06 UTC 2019

Hi all,

Here's how I think about the problem:

If you think about it—a Byte is a fixed-size batch of bits (8).
Translated to larger sizes means defining a larger fixed-size batch type.

Now, byte[] is problematic in the sense that it is not fixed-size, and that
its max size is much too large to be practical from a backpressure
point-of-view.
Furthermore, byte[] is less-than-ideal since it is mutable, and therefor
has implications on its reusability.
One can (rightfully!) argue that fixed-size is most often not desirable, as
the "last" chunk is more than likely less than the fixed-size.

This leaves the most desirable option to define a variable-size datatype
with a known, practical max size, 64k?.
Pragmatically one can select a configurably max-sized variable-size type
(such as ByteBuffer).

As for Byte and Char, they typically are not self-contained (Byte typically
representing a piece of something larger) and a Char represents a
codepoint, so they are not "superior" choices from that perspective. Worth
noting is that synchronous Publisher-Subscriber pairs can reach a transfer
rate (in elements) of > 180mops on a developer laptop.

I hope that helps,

√

On Tue, Jan 15, 2019 at 3:28 AM Douglas Surber <douglas.surber at oracle.com>
wrote:

> Given a reactive source/sink of bytes or chars what is the best way to
> subscribe/publish byte[]/String?
>
> When working with an Input/OutputStream or a Reader/Writer it is easy to
> define methods that read both single bytes/chars and byte[]s/Strings. But
> it is not obvious to me how to do the same with Flow. If one declares a
> Flow.Publisher<Byte> then the bytes are processed one at a time. In high
> throughput systems that doesn't work. It is much better to read byte[]s,
> preferably large ones, on each call rather than single bytes. This can make
> a really big difference in performance. Same with processing individual
> chars as opposed to Strings. On the other hand, sometimes processing only a
> single byte/char is the right thing.
>
> Let's consider a Flow.Publisher<byte[]>. If the subscriber requests 1
> element, the Subscriber has no idea how many bytes that element will
> contain. It could be 1 byte or it could be 1G byte. What the Subscriber
> wants to do is say "give me 16k bytes" and get one or more calls to next
> where the total length of all the byte[]s passed to next is 16k, not 16k
> calls to next. This clearly violates the contract for
> Flow.Subscription.request.
>
> I can't be the first person to have this question, but my Google-fu is not
> sufficient to find an answer. I think the problem is I don't know the
> reactive stream terminology necessary to pose the question.
>
> The particular use case I'm working on is Blob/Clob. It would be easy
> enough for Blob/Clob to provide publish/subscribe methods that return
> Flow.Publisher<Byte/Char> and Flow.Subscriber<Byte/Char>, but that's not
> going to be performant. On the other hand Flow.Publisher<byte[]/String> and
> Flow.Subscriber<byte[]/String> don't really do the right thing.
>
> Suggestions? How do other reactive stream users address this issue?
>
> Douglas

-- 
Cheers,
√

*Deputy Chief Technology Officer, Lightbend, Inc.*
v at lightbend.com
@viktorklang <https://twitter.com/viktorklang>

<https://www.lightbend.com>