WebSocket client API
Joakim Erdfelt
joakim.erdfelt at gmail.com
Tue Oct 20 22:16:55 UTC 2015
I'm done with the UTF8 topic, you seem to have it in hand.
If you feel the existing facilities can handle it, go for it.
Just don't forget to test your impl on Autobahn Testsuite.
On Tue, Oct 20, 2015 at 3:06 PM, Pavel Rappo <pavel.rappo at oracle.com> wrote:
> Hi Joakim,
>
> > On 20 Oct 2015, at 14:37, Joakim Erdfelt <joakim.erdfelt at gmail.com>
> wrote:
> >
> > But we *think* we understand what you are trying to do.
> >
> > Here's a split UTF8 scenario (just whipped up)
> > https://gist.github.com/joakime/e34b727a6989ca7cef94
> >
> > So the JVM implementation side will take the raw bytes (presumably as a
> ByteBuffer), and when the entire message is fully received it will convert
> it to a CharBuffer using the CharsetDecoder for UTF8 with REPORT logic to
> capture bad UTF8 sequences.
> >
> > Some concerns about this approach.
> >
> > 1. You can't fast-fail a large and fragmented TEXT message if the
> problematic UTF8 sequence occurs early (this is a spec test in the autobahn
> testsuite btw)
> > 2. You can't use CharBuffer with partial TEXT message handling, as UTF8
> sequences that are split across Frames will trigger the REPORT processing.
> (see gist/example above for this scenario) (also a spec test in the
> autobahn testsuite)
> > 3. For each TEXT message, there's 2 data copies (ByteBuffer ->
> HeapCharBuffer -> String) for it to be practical to use in many 3rd party
> libs (eg JSON parsing). For large messages, this can get expensive.
>
> Joakim,
>
> If I tell you that
>
> 1. CharsetDecoder is a stateful object which is capable of incremental
> decoding from any given ByteBuffer into any given CharBuffer. Have a
> look at
> CharsetDecoder#decode(java.nio.ByteBuffer, java.nio.CharBuffer,
> boolean)
> method.
>
> 2. String#String(byte[], int, int, java.nio.charset.Charset) uses the
> same
> machinery underneath. The difference (among other) is, the this
> constructor
> creates Buffer wrappers. that we can preallocate a bunch of
> CharBuffers and
> reuse them (speaking of performance).
>
> 3. Probably the quickest way bytes from a Channel can end up being
> UTF-8
> decoded chars is through the ByteBuffer, CharsetEncoder and CharBuffer
> (If
> you know better, please tell me).
>
> 4. So if, after all, one needs a String the pipeline:
>
> Channel --> ByteBuffer --> CharsetDecoder --> CharBuffer --> String
>
> would be the quickest way to it.
>
> 5. Not everyone, probably, needs a String. For some users a
> CharSequence
> would do. Consider appending it (or its subsequence) to
> java.lang.Appendable, or just processing Stream from cs.chars(), etc.
>
> would it change any of your concerns? If not, please try to explain these
> problems again, even in more details.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/net-dev/attachments/20151020/fda59c99/attachment-0001.html>
More information about the net-dev
mailing list