Strange behaviour of java.nio.charset.StandardCharsets.UTF8.newDecoder()

Martin Buchholz martinrb at google.com
Fri Mar 18 20:24:28 UTC 2016


Please take a look at Utf8.java in protobuf land

http://grepcode.com/file/repo1.maven.org/maven2/com.google.protobuf/protobuf-java/2.5.0/com/google/protobuf/Utf8.java

which supports multiple input segments, but is not a Charset.  UTF-8
only, but nothing else matters anymore, hopefully!

(As always, if you want Google code to be contributed to openjdk, just ask)

On Fri, Mar 18, 2016 at 11:45 AM, Xueming Shen <xueming.shen at oracle.com> wrote:
>
> On 03/18/2016 11:44 AM, Martin Buchholz wrote:
>>
>> Coders could have an internal buffer to store incomplete sequences,
>> but the design is that they don't.
>>
>> https://docs.oracle.com/javase/8/docs/api/java/nio/charset/CoderResult.html
>>
>> https://docs.oracle.com/javase/8/docs/api/java/nio/charset/CoderResult.html#UNDERFLOW
>> It's more efficient to not have (yet another) intermediate buffer.
>>
>
> Martin,
>
> There are use cases that the input bytes are in several ByteBuffer buffers
> already (originally they were in byte[], read from the sockets, for example,
> wrapped into buffers), and the cluster of bytes for a "char" were cut in the
> middle by the buffer boundaries ... it is really hard to re-range the
> buffers
> for the next decoding without a copy/paste. I'm considering the possibility
> to have a decode(ByteBuffer... bufs) for this kinda of use case. Opinion?
>
> Sherman
>
>
>> On Fri, Mar 18, 2016 at 11:38 AM, Pavel Rappo<pavel.rappo at oracle.com>
>> wrote:
>>>
>>> Why is that? I don't think I have to supply only "correct" chunks. After
>>> all,
>>> decoders are free to maintain an internal state needed for this.
>>>
>>>> On 18 Mar 2016, at 18:31, Martin Buchholz<martinrb at google.com>  wrote:
>>>>
>>>> trying to decode one byte at a time, which cannot
>>>> work?  The minimum unit to decode that will work is 4 bytes
>
>


More information about the nio-dev mailing list