review request for 6798511/6860431: Include functionality of Surrogate in Character
Ulf Zibis
Ulf.Zibis at gmx.de
Sat Mar 20 21:52:32 UTC 2010
Am 20.03.2010 01:13, schrieb Martin Buchholz:
> Interesting benchmark results!
>
> Your microbenchmark technique looks unusual, but seems to work.
>
- yes, warmup is integrated without need for coding extra loop
-- here the more sophisticated version to detect slowdowns caused by GC,
Hotspot or OS activity (line 245 ...):
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/j7_EUC_TW/src/sun/nio/cs/ext/EUC_TWBenchmark.java?rev=1008&view=markup
- my technique mostly eliminates influences of irregular system slowdowns
-- this is of special importance on mobile systems, where CPU clock
could become throttled caused from overheating
> I'm surprised there is that much difference.
>
I wasn't after studying the disassemblies.
> I would take out the swallowing of Exception.
>
Thanx, caused by copy'n paste. ;-)
> ---
>
> Your data contains only supplementary characters,
> which we are assuming are very rare.
>
Yes, on BMP characters all variations have same speed.
> So I don't consider speeding up such a benchmark
> very important,
Yes, even on taiwanese machines, using EUC_TW, surrogates frequency may
be 1 %, but character processing at all should be frequent on several
applications.
On the other hand, frequency of other APIs, e.g. array sort, even should
be low on overall applications. Does that justify stepmotherly
maintained code, in particular if we can have it for more or less nothing.
> but....
>
> We can do it for free
> by switching isSupplementaryCodePoint => isValidCodePoint,
> so why not?
>
Yep, I stepmotherly revised other methods, my focus was on String(int[],
int, int) and outsourcing bond checks.
BTW, what do you think of the latter?
> ---
>
> While checking this, I noticed that Character.toChars can
> be sped up by using our new isBMPCodePoint method
> (always optimize for BMP!)
>
I guess you have noticed, that the main change I just have done earlier.
But I couldn't imagine, that we would drop the optimized form of
isSupplementaryCodePoint(). ;-)
> ---
>
> Here's the change I'm making on top of isBMPCodePoint:
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/
>
> Ulf, please review.
>
- have you noticed my change on toSurrogates() in returning the count?
Useful too in AbstractStringBuilder#appendCodePoint() and plenty of
sun.nio.cs decoders.
- A little "bug" in javadoc:
@exception ArrayIndexOutOfBoundsException
instead IndexOutOfBoundsException
- String#indexOf(int, int):
// handle supplementary characters here
char high = Character.highSurrogate(ch);
char low = Character.lowSurrogate(ch);
for ( ; i < max-1; i++)
if (v[i] == high && v[i+1] == low)
return i - offset;
-Ulf
More information about the core-libs-dev
mailing list