WebSocket client API

Tue Oct 20 22:06:21 UTC 2015

Hi Joakim,

> On 20 Oct 2015, at 14:37, Joakim Erdfelt <joakim.erdfelt at gmail.com> wrote:
> 
> But we *think* we understand what you are trying to do.
> 
> Here's a split UTF8 scenario (just whipped up)
> https://gist.github.com/joakime/e34b727a6989ca7cef94
> 
> So the JVM implementation side will take the raw bytes (presumably as a ByteBuffer), and when the entire message is fully received it will convert it to a CharBuffer using the CharsetDecoder for UTF8 with REPORT logic to capture bad UTF8 sequences.
> 
> Some concerns about this approach.
> 
> 1. You can't fast-fail a large and fragmented TEXT message if the problematic UTF8 sequence occurs early (this is a spec test in the autobahn testsuite btw)
> 2. You can't use CharBuffer with partial TEXT message handling, as UTF8 sequences that are split across Frames will trigger the REPORT processing. (see gist/example above for this scenario) (also a spec test in the autobahn testsuite)
> 3. For each TEXT message, there's 2 data copies (ByteBuffer -> HeapCharBuffer -> String) for it to be practical to use in many 3rd party libs (eg JSON parsing).  For large messages, this can get expensive.

Joakim,

If I tell you that

    1. CharsetDecoder is a stateful object which is capable of incremental
    decoding from any given ByteBuffer into any given CharBuffer. Have a look at
    CharsetDecoder#decode(java.nio.ByteBuffer, java.nio.CharBuffer, boolean)
    method.

    2. String#String(byte[], int, int, java.nio.charset.Charset) uses the same
    machinery underneath. The difference (among other) is, the this constructor
    creates Buffer wrappers. that we can preallocate a bunch of CharBuffers and
    reuse them (speaking of performance).

    3. Probably the quickest way bytes from a Channel can end up being UTF-8
    decoded chars is through the ByteBuffer, CharsetEncoder and CharBuffer (If
    you know better, please tell me).

    4. So if, after all, one needs a String the pipeline:

        Channel --> ByteBuffer --> CharsetDecoder --> CharBuffer --> String

    would be the quickest way to it.

    5. Not everyone, probably, needs a String. For some users a CharSequence
    would do. Consider appending it (or its subsequence) to
    java.lang.Appendable, or just processing Stream from cs.chars(), etc.

would it change any of your concerns? If not, please try to explain these
problems again, even in more details.