WebSocket client API
Pavel Rappo
pavel.rappo at oracle.com
Sat Oct 17 22:08:32 UTC 2015
Hi Joakim,
> On 17 Oct 2015, at 22:42, Joakim Erdfelt <joakim.erdfelt at gmail.com> wrote:
>
> You are required, per the RFC6455 spec, to validate incoming and outgoing TEXT messages are valid UTF8.
> (also Handshake and Close Reason Messages)
>
> http://tools.ietf.org/html/rfc6455#section-8.1
>
> Relying on the JVM built-in replacement character behavior for invalid UTF8 sequences will cause many bugs.
> If you rely on the CharsetEncoder and CharBuffer you'll wind up with situations where you are changing the data.
>
> You need to rely on an implementation that does not use replacement characters and throws exceptions on bad Write,
> and on bad received TEXT messages you MUST close the connection with a 1007 error code.
The only thing I was trying to say is that in my opinion there's no extra
confidence in UTF-8 representability that CharSequence or even String gives us
compared to what CharBuffer does. On the other hand, compared to any other
implementation of CharSequence or String, CharBuffer is the most
charset-friendly thing we have: CharsetEncoder/CharsetDecoder speaks in
CharBuffers.
Sorry, but I believe I haven't proposed to rely on JDK built-in replacement
characters. Moreover, being able to tell the decoder/encoder to throw exceptions
(e.g. UnmappableCharacterException) on incorrect input was one of the main
reasons to use CharsetEncoder/Decoder. And not, say,
String.getBytes(StandardCharsets.UTF_8).
Thanks.
More information about the net-dev
mailing list