Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

Thu Apr 28 11:01:33 UTC 2011

Xueming Shen wrote:
>  Hi
>
> This is motivated by Neil's request to optimize common-case UTF8 path 
> for native ZipFile.getEntry calls [1].
> As I said in my replying email [2] I believe a better approach might 
> be to "patch" UTF8 charset directly to
> implement sun.nio.cs.ArrayDecoder/Encoder interface to speed up the 
> coding operation for array based
> encoding/decoding under certain circumstance, as we did for all single 
> byte charsets in #6636323 [3]. I
> have a old blog [4] that has some data for this optimization.
>
> The original plan was to do the same thing for our new UTF8 [5] as 
> well in JDK7, but then (excuse, excuse)
> I was just too busy to come back to this topic till 2 days ago. After 
> two days of small tweaking here and there
> and testing those possible corner cases I can think of, I'm happy with 
> the result and think it might be
> worth sending it out for a codereview for JDK7, knowing we only have 
> couple days left.
I skimmed through the webrev and I agree this is a better approach. I 
will try to do a detailed review before Monday. It would be great if 
others on the list could jump in and help too as we are running out of time.

Neil - I don't know if you've had a chance to look at Sherman's changes 
but I think it's better than checking if mUTF-8 can be used.  If you 
agree then would you have time to run your tests that altered you to 
this performance regression? There's a patch file in the webrev.

-Alan.