WebSocket client API
Pavel Rappo
pavel.rappo at oracle.com
Tue Oct 20 22:06:21 UTC 2015
Hi Joakim,
> On 20 Oct 2015, at 14:37, Joakim Erdfelt <joakim.erdfelt at gmail.com> wrote:
>
> But we *think* we understand what you are trying to do.
>
> Here's a split UTF8 scenario (just whipped up)
> https://gist.github.com/joakime/e34b727a6989ca7cef94
>
> So the JVM implementation side will take the raw bytes (presumably as a ByteBuffer), and when the entire message is fully received it will convert it to a CharBuffer using the CharsetDecoder for UTF8 with REPORT logic to capture bad UTF8 sequences.
>
> Some concerns about this approach.
>
> 1. You can't fast-fail a large and fragmented TEXT message if the problematic UTF8 sequence occurs early (this is a spec test in the autobahn testsuite btw)
> 2. You can't use CharBuffer with partial TEXT message handling, as UTF8 sequences that are split across Frames will trigger the REPORT processing. (see gist/example above for this scenario) (also a spec test in the autobahn testsuite)
> 3. For each TEXT message, there's 2 data copies (ByteBuffer -> HeapCharBuffer -> String) for it to be practical to use in many 3rd party libs (eg JSON parsing). For large messages, this can get expensive.
Joakim,
If I tell you that
1. CharsetDecoder is a stateful object which is capable of incremental
decoding from any given ByteBuffer into any given CharBuffer. Have a look at
CharsetDecoder#decode(java.nio.ByteBuffer, java.nio.CharBuffer, boolean)
method.
2. String#String(byte[], int, int, java.nio.charset.Charset) uses the same
machinery underneath. The difference (among other) is, the this constructor
creates Buffer wrappers. that we can preallocate a bunch of CharBuffers and
reuse them (speaking of performance).
3. Probably the quickest way bytes from a Channel can end up being UTF-8
decoded chars is through the ByteBuffer, CharsetEncoder and CharBuffer (If
you know better, please tell me).
4. So if, after all, one needs a String the pipeline:
Channel --> ByteBuffer --> CharsetDecoder --> CharBuffer --> String
would be the quickest way to it.
5. Not everyone, probably, needs a String. For some users a CharSequence
would do. Consider appending it (or its subsequence) to
java.lang.Appendable, or just processing Stream from cs.chars(), etc.
would it change any of your concerns? If not, please try to explain these
problems again, even in more details.
More information about the net-dev
mailing list