WebSocket client API

Tue Oct 20 08:39:15 UTC 2015

On 10/18/2015 12:08 AM, Pavel Rappo wrote:
> Hi Joakim,
>
>> On 17 Oct 2015, at 22:42, Joakim Erdfelt <joakim.erdfelt at gmail.com> wrote:
>>
>> You are required, per the RFC6455 spec, to validate incoming and outgoing TEXT messages are valid UTF8.
>> (also Handshake and Close Reason Messages)
>>
>> http://tools.ietf.org/html/rfc6455#section-8.1
>>
>> Relying on the JVM built-in replacement character behavior for invalid UTF8 sequences will cause many bugs.
>> If you rely on the CharsetEncoder and CharBuffer you'll wind up with situations where you are changing the data.
>>
>> You need to rely on an implementation that does not use replacement characters and throws exceptions on bad Write,
>> and on bad received TEXT messages you MUST close the connection with a 1007 error code.
> The only thing I was trying to say is that in my opinion there's no extra
> confidence in UTF-8 representability that CharSequence or even String gives us
> compared to what CharBuffer does. On the other hand, compared to any other
> implementation of CharSequence or String, CharBuffer is the most
> charset-friendly thing we have: CharsetEncoder/CharsetDecoder speaks in
> CharBuffers.
>
> Sorry, but I believe I haven't proposed to rely on JDK built-in replacement
> characters. Moreover, being able to tell the decoder/encoder to throw exceptions
> (e.g. UnmappableCharacterException) on incorrect input was one of the main
> reasons to use CharsetEncoder/Decoder. And not, say,
> String.getBytes(StandardCharsets.UTF_8).
>
> Thanks.
>

Hi,

Just to clear things... The onText(..., CharBuffer cb, ...) call-back 
method receives a CharBuffer with content that is already UTF-8 decoded 
from wire message bytes, right? If it was different, it would not be 
right! So decoding is performed by WebSocket implementation, not by user 
and therefore can be performed per RFC6455 spec. CharBuffer, 
CharSequence, String - those object all represent characters and their 
API has nothing to do with UTF-8 or any other encoding.

Regards, Peter