Strange behaviour of java.nio.charset.StandardCharsets.UTF8.newDecoder()

Fri Mar 18 18:45:33 UTC 2016

On 03/18/2016 11:44 AM, Martin Buchholz wrote:
> Coders could have an internal buffer to store incomplete sequences,
> but the design is that they don't.
> https://docs.oracle.com/javase/8/docs/api/java/nio/charset/CoderResult.html
> https://docs.oracle.com/javase/8/docs/api/java/nio/charset/CoderResult.html#UNDERFLOW
> It's more efficient to not have (yet another) intermediate buffer.
>

Martin,

There are use cases that the input bytes are in several ByteBuffer buffers
already (originally they were in byte[], read from the sockets, for example,
wrapped into buffers), and the cluster of bytes for a "char" were cut in the
middle by the buffer boundaries ... it is really hard to re-range the buffers
for the next decoding without a copy/paste. I'm considering the possibility
to have a decode(ByteBuffer... bufs) for this kinda of use case. Opinion?

Sherman

> On Fri, Mar 18, 2016 at 11:38 AM, Pavel Rappo<pavel.rappo at oracle.com>  wrote:
>> Why is that? I don't think I have to supply only "correct" chunks. After all,
>> decoders are free to maintain an internal state needed for this.
>>
>>> On 18 Mar 2016, at 18:31, Martin Buchholz<martinrb at google.com>  wrote:
>>>
>>> trying to decode one byte at a time, which cannot
>>> work?  The minimum unit to decode that will work is 4 bytes