review request for 6798511/6860431: Include functionality of Surrogate in Character

Martin Buchholz martinrb at google.com
Sat Mar 20 00:13:13 UTC 2010


Interesting benchmark results!

Your microbenchmark technique looks unusual, but seems to work.

I'm surprised there is that much difference.

I would take out the swallowing of Exception.

---

Your data contains only supplementary characters,
which we are assuming are very rare.
So I don't consider speeding up such a benchmark
very important, but....

We can do it for free
by switching isSupplementaryCodePoint => isValidCodePoint,
so why not?

---

While checking this, I noticed that Character.toChars can
be sped up by using our new isBMPCodePoint method
(always optimize for BMP!)

---

Here's the change I'm making on top of isBMPCodePoint:
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/

Ulf, please review.

diff --git a/src/share/classes/java/lang/Character.java
b/src/share/classes/java/lang/Character.java
--- a/src/share/classes/java/lang/Character.java
+++ b/src/share/classes/java/lang/Character.java
@@ -3099,15 +3099,15 @@
      * @since  1.5
      */
     public static int toChars(int codePoint, char[] dst, int dstIndex) {
-        if (codePoint < 0 || codePoint > MAX_CODE_POINT) {
+        if (isBMPCodePoint(codePoint)) {
+            dst[dstIndex] = (char) codePoint;
+            return 1;
+        } else if (isValidCodePoint(codePoint)) {
+            toSurrogates(codePoint, dst, dstIndex);
+            return 2;
+        } else {
             throw new IllegalArgumentException();
         }
-        if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
-            dst[dstIndex] = (char) codePoint;
-            return 1;
-        }
-        toSurrogates(codePoint, dst, dstIndex);
-        return 2;
     }

     /**
@@ -3127,15 +3127,15 @@
      * @since  1.5
      */
     public static char[] toChars(int codePoint) {
-        if (codePoint < 0 || codePoint > MAX_CODE_POINT) {
+        if (isBMPCodePoint(codePoint)) {
+            return new char[] { (char) codePoint };
+        } else if (isValidCodePoint(codePoint)) {
+            char[] result = new char[2];
+            toSurrogates(codePoint, result, 0);
+            return result;
+        } else {
             throw new IllegalArgumentException();
         }
-        if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
-                return new char[] { (char) codePoint };
-        }
-        char[] result = new char[2];
-        toSurrogates(codePoint, result, 0);
-        return result;
     }

     static void toSurrogates(int codePoint, char[] dst, int index) {
diff --git a/src/share/classes/java/lang/String.java
b/src/share/classes/java/lang/String.java
--- a/src/share/classes/java/lang/String.java
+++ b/src/share/classes/java/lang/String.java
@@ -281,7 +281,7 @@
             int c = codePoints[i];
             if (Character.isBMPCodePoint(c))
                 n += 1;
-            else if (Character.isSupplementaryCodePoint(c))
+            else if (Character.isValidCodePoint(c))
                 n += 2;
             else throw new IllegalArgumentException(Integer.toString(c));
         }
diff --git a/src/share/classes/sun/nio/cs/Surrogate.java
b/src/share/classes/sun/nio/cs/Surrogate.java
--- a/src/share/classes/sun/nio/cs/Surrogate.java
+++ b/src/share/classes/sun/nio/cs/Surrogate.java
@@ -294,7 +294,7 @@
                 dst.put((char)uc);
                 error = null;
                 return 1;
-            } else if (Character.isSupplementaryCodePoint(uc)) {
+            } else if (Character.isValidCodePoint(uc)) {
                 if (dst.remaining() < 2) {
                     error = CoderResult.OVERFLOW;
                     return -1;
@@ -338,7 +338,7 @@
                 da[dp] = (char)uc;
                 error = null;
                 return 1;
-            } else if (Character.isSupplementaryCodePoint(uc)) {
+            } else if (Character.isValidCodePoint(uc)) {
                 if (dl - dp < 2) {
                     error = CoderResult.OVERFLOW;
                     return -1;

Martin

On Fri, Mar 19, 2010 at 14:46, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> Am 16.03.2010 23:35, schrieb Xueming Shen:
>>
>> Martin Buchholz wrote:
>>>
>>> The primary theory here is that branches are expensive,
>>> and we are reducing them by one.
>>>
>>
>> There are still two branches in new impl, if you count the "ifeq" and
>> "if_icmpge"(?)
>>
>> We are trying to "optimize" this piece of code with the assumption that
>> the new impl MIGHT help certain vm (hotspot?)
>> to optimize certain use scenario (some consecutive usages), if the
>> compiler and/or the vm are both smart enough at certain
>> point, with no supporting benchmark data?
>
> I've finished the benchmark:
> https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/src/java/lang/CharacterBenchmark.java?rev=1006&view=log
>
> The results:
> time1: 2316,213 ms  ..à la Martin
> time2: 1267,063 ms
> time3: 1245,972 ms  ..using isValidCodePoint
> time4: 1467,570 ms  ..validate version   (slower, because of unreasonable
> HotSpot optimizing, see "C2 optimization bug ?" in hotspot-compiler-dev
> list)
>
> Here see the disassembly snippets:
> https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/log/PA_Character_compare.txt?rev=1007&view=markup
>



More information about the core-libs-dev mailing list