Lower overhead String encoding/decoding

Mon Nov 24 18:41:07 UTC 2014

Hi Richard,

Here are some comments regarding the updated webrev.

(1) String(ByteBuffer, String) needs "@throw UEE".
(2) It should be "default replacement byte array" not "replace string" for all
      getBytes() methods when malformed/unmappable
(3) for decoding (new String) from ByteBuffer, since it is guaranteed that all
      bytes in the input ByteBuffer will be decoded, it might be desirable to clearly
      specify that the position of the buffer will be "advanced to its limit" ?
(4) ln#1137 has an extra "*"
(5) StringCoding.decode(cs, bb), why do you want to allocate a direct buffer
      here for "untrusted"? Basically the output buffer "ca" will always be a wrapper
      of a char[], all our charset implementation will have better performance if
      both input and output are "array based" (decode on array directly, instead of
      the "slow" ByteBuffer)

-Sherman

On 11/24/2014 08:45 AM, Richard Warburton wrote:
> Hi all,
>
> I'm sure everyone is very busy, but is there any chance that someone take a
> look at the latest iteration of this patch?
>
> Thanks for taking the time to look at this - most appreciated. I've pushed
>>>>> the latest iteration to http://cr.openjdk.java.net/~
>>>>> rwarburton/string-patch-webrev-8/<http://cr.openjdk.java.net/%
>>>>> 7Erwarburton/string-patch-webrev-8/>.
>>>>>
>>>>>   I think this is looking good.
>> Thanks - I've pushed a new iteration to http://cr.openjdk.java.net/~
>> rwarburton/string-patch-webrev-9.
>>
>> For the constructor then the words "decoding the specified byte buffer",
>>>> it might be a bit clearer as "decoding the remaining bytes in the ...".
>>>>
>>>> For getBytes(ByteBuffer, Charset) then the position is advanced by the
>>>> bytes written, no need to mention the number of chars read here.
>>>>
>>>> In the constructor then you make it clear that malformed/unmappable
>>>> sequences use the default replacement. This is important to state in the
>>>> getBytes methods too because the encoding can fail.
>>>
>> I've made all these changes.
>>
>> Hi Richard, hi all,
>>> Two comments,
>>> You replace the nullcheck in getBytes() by a requireNonNull and at the
>>> same time, you don"t use requireNonNull in String(ByteBuffer,Charset),
>>> no very logical isn't it ?
>>>
>> Thanks for spotting this Remi - I've fixed this issue in my latest
>> iteration.
>>
>> I think you need a supplementary constructor that takes a ByteBuffer and a
>>> charset name as a String,
>>> i may be wrong, but it seems that every constructor of String that takes
>>> a Charset has a dual constructor that takes a Charset as a String.
>>> As far as I remember, a constructor that takes a Charset as a String may
>>> be faster because you can share the character decoder
>>> instead of creating a new one.
>>
>> A good observation. I've added variants which take a String for the
>> charset name as well the charset variants.
>>
>> Having said that - wouldn't it also be a good idea to replicate the
>> caching on the charset versions as well as the charset name? I don't see
>> any obvious reason why this isn't possible but perhaps there's something
>> I'm missing here. Probably cleaner as a separate patch either way.
>>
> regards,
>
>    Richard Warburton
>
>    http://insightfullogic.com
>    @RichardWarburto<http://twitter.com/richardwarburto>