String.lastIndexOf confused by unpaired trailing surrogate
Ulf Zibis
Ulf.Zibis at gmx.de
Thu Mar 25 17:19:06 UTC 2010
Am 24.03.2010 09:24, schrieb Martin Buchholz:
> Ulf, Sherman, Masayoshi,
> here are changes for you to review.
> Only the patch highSurrogate needs a separate bug filed
> (and CCC, please)
>
> Ulf, I've made some progress on integrating your changes,
> although almost all of them have been somewhat martinized:
>
> Ulf-style tidying, mostly whitespace.
> [mq]: Character-warnings2
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings2
>
I would prefer (better visibility of continued line):
public final class Character
implements java.io.Serializable, Comparable<Character> {
I would prefer (indicates, that we are in current class):
#isDigit(char)
instead
Character#isDigit(char)
but indeed better than
java.lang.Character#isDigit(char)
> Very minor optimizations. Barely worth doing.
> Note my removal of the need to have n++ inside the loop.
>
Overseen. Shame on me, as that's true Ulf-style. Yes, reduces
in/decrements on rare supplementary cases.
> imported patch ulf-opto
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ulf-opto
>
> Addition of highSurrogate and lowSurrogate
> imported patch highSurrogate
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/highSurrogate
>
Looks good. Interesting workaround on my "Note:"
I've reckoned with dropping my highSurrogate(char highCPWord, char
lowCPWord).
Anyway I like to note, that I use that shortcut in my EUC_TW$Decoder
twiddling. Following code:
da[dp] = Character.highSurrogate(0x20000 + c);
results in (19 bytes):
0x00b8ae27: add $0x20000,%ecx ;*iadd
; -
sun.nio.cs.ext.D_21_d_narrow::decode at 98 (line 196)
0x00b8ae2d: mov %ecx,%ebp
0x00b8ae2f: shr $0xa,%ebp
0x00b8ae32: add $0xd7c0,%ebp ;*isub
; -
java.lang.Character::highSurrogate at 9 (line 3343)
; -
sun.nio.cs.ext.D_21_d_narrow::decode at 99 (line 196)
da[dp] = Character.highSurrogate((char)0x2, c);
results in (9 bytes):
0x00b899e7: shr $0xa,%ebp
0x00b899ea: add $0xd840,%ebp ;*isub
; -
java.lang.Character::highSurrogate at 14 (line 3365)
; -
sun.nio.cs.ext.D_22_d_n_fastSurrogate::decode at 97 (line 196)
dst.putInt(Character.highSurrogate((char)0x2, c)) << 16 |
Character.lowSurrogate(c));
would additionally increase performance. I'm still preparing the
benchmark + disassembly.
Those twiddling could be used in all surrogate processing charset
coders, e.g. maybe true for UTF_x.
If public, would be too useful for developers coding charset coders for
exotic charsets via java.nio.charset.spi.CharsetProvider
-Ulf
More information about the core-libs-dev
mailing list