review request for 6798511/6860431: Include functionality of Surrogate in Character

Ulf Zibis Ulf.Zibis at gmx.de
Sat Mar 20 21:52:32 UTC 2010


Am 20.03.2010 01:13, schrieb Martin Buchholz:
> Interesting benchmark results!
>
> Your microbenchmark technique looks unusual, but seems to work.
>    

- yes, warmup is integrated without need for coding extra loop
-- here the more sophisticated version to detect slowdowns caused by GC, 
Hotspot or OS activity (line 245 ...):
    
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/j7_EUC_TW/src/sun/nio/cs/ext/EUC_TWBenchmark.java?rev=1008&view=markup
- my technique mostly eliminates influences of irregular system slowdowns
-- this is of special importance on mobile systems, where CPU clock 
could become throttled caused from overheating

> I'm surprised there is that much difference.
>    

I wasn't after studying the disassemblies.

> I would take out the swallowing of Exception.
>    

Thanx, caused by copy'n paste. ;-)

> ---
>
> Your data contains only supplementary characters,
> which we are assuming are very rare.
>    

Yes, on BMP characters all variations have same speed.

> So I don't consider speeding up such a benchmark
> very important,

Yes, even on taiwanese machines, using EUC_TW, surrogates frequency may 
be 1 %, but character processing at all should be frequent on several 
applications.
On the other hand, frequency of other APIs, e.g. array sort, even should 
be low on overall applications. Does that justify stepmotherly 
maintained code, in particular if we can have it for more or less nothing.

>   but....
>
> We can do it for free
> by switching isSupplementaryCodePoint =>  isValidCodePoint,
> so why not?
>    

Yep, I stepmotherly revised other methods, my focus was on String(int[], 
int, int) and outsourcing bond checks.
BTW, what do you think of the latter?

> ---
>
> While checking this, I noticed that Character.toChars can
> be sped up by using our new isBMPCodePoint method
> (always optimize for BMP!)
>    

I guess you have noticed, that the main change I just have done earlier.
But I couldn't imagine, that we would drop the optimized form of 
isSupplementaryCodePoint(). ;-)

> ---
>
> Here's the change I'm making on top of isBMPCodePoint:
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/
>
> Ulf, please review.
>    

- have you noticed my change on toSurrogates() in returning the count? 
Useful too in AbstractStringBuilder#appendCodePoint() and plenty of 
sun.nio.cs decoders.

- A little "bug" in javadoc:
   @exception ArrayIndexOutOfBoundsException
   instead    IndexOutOfBoundsException

- String#indexOf(int, int):
              // handle supplementary characters here
              char high = Character.highSurrogate(ch);
              char low = Character.lowSurrogate(ch);
              for ( ; i < max-1; i++)
                  if (v[i] == high && v[i+1] == low)
                          return i - offset;


-Ulf






More information about the core-libs-dev mailing list