Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()
Xueming Shen
xueming.shen at oracle.com
Mon May 2 17:15:45 UTC 2011
On 5/2/2011 7:31 AM, Alan Bateman wrote:
> Xueming Shen wrote:
>> Hi
>>
>> This is motivated by Neil's request to optimize common-case UTF8 path
>> for native ZipFile.getEntry calls [1].
>> As I said in my replying email [2] I believe a better approach might
>> be to "patch" UTF8 charset directly to
>> implement sun.nio.cs.ArrayDecoder/Encoder interface to speed up the
>> coding operation for array based
>> encoding/decoding under certain circumstance, as we did for all
>> single byte charsets in #6636323 [3]. I
>> have a old blog [4] that has some data for this optimization.
>>
>> The original plan was to do the same thing for our new UTF8 [5] as
>> well in JDK7, but then (excuse, excuse)
>> I was just too busy to come back to this topic till 2 days ago. After
>> two days of small tweaking here and there
>> and testing those possible corner cases I can think of, I'm happy
>> with the result and think it might be
>> worth sending it out for a codereview for JDK7, knowing we only have
>> couple days left.
>>
>> The webrev is at
>>
>> http://cr.openjdk.java.net/~sherman/7040220/webrev
> I went through the changes and the approach looks good to me - thanks
> for jumping on this. Thanks Ulf for helping review. Also thanks Neil
> for reporting this and testing with Sherman's change to verify that it
> addresses the performance regression.
>
> Sherman - just a couple of minor comments:
Thanks Alan!
Webrev has been updated accordingly.
I renamed the getBBuffer to getByteBuffer, now it "looks" better:-)
Thanks,
-Sherman
>
> It would be good to put a blank line after the ASCII-only loops so
> that future maintainers can easily distinguish the loops (this would
> make it consistent with decodeArrayLoop for example).
>
> In several places the test is "if (CodingErrorAction.REPLACE !=
> malformedInputAction())". Personally I would swap this to "if
> (malformedInputAction() != CodingErrorAction.REPLACE)". This reminds,
> if a couple of these slipped through into the zip code recently, eg:
> "if (false == inf.ended())", "if (false == streams.isEmpty())".
>
> The caching of the ByteBuffer and getBBuffer for the malformed case
> isn't as nice as the original code. Minimally I would move the
> declaration of bb so that it's not in the middle of dl and dp, and
> rename getBBuffer. Alternatively I would just get rid of it as it
> shouldn't be performance critical.
>
> In TestStringCodeUTF8 it might be cleaner to put the body of main into
> its own method and then call it once, set the security manager, and
> then call it again.
>
> That's all I have,
>
> -Alan.
>
>
>
More information about the core-libs-dev
mailing list