review request for 6798511/6860431: Include functionality of Surrogate in Character

Ulf Zibis Ulf.Zibis at gmx.de
Thu Aug 27 13:33:41 UTC 2009


Martin, thanks for your review ...


Am 27.08.2009 03:24, Martin Buchholz schrieb:
> On Wed, Aug 26, 2009 at 02:59, Ulf Zibis<Ulf.Zibis at gmx.de> wrote:
>   
>> Hi all,
>>
>> I have put all important things of sun.nio.cs.Surrogate to java.* packages,
>> I guess. We likely could get rid of it.
>>     
>
> I am in principle of adding the method isBMP to Character.java
> (I did write it!) but I was hoping we can find a better name for it...
>   

I had the same thought, but just referred to the elder. ;-)

> Hmmmmm..... The Unicode Glossary uses BMP consistently,
> so maybe isBMP is the best we can come up with?
>
> How about isBMPCodePoint?
>   

This feels best for me too.

> http://unicode.org/glossary/#BMP_code_point
>
> Yeah, I think that's not too bad.
>
> Martin
>
>
>   
>> See: https://bugs.openjdk.java.net/show_bug.cgi?id=100104
>>
>>
>> ================
>> +    public static char[] toChars(int codePoint) {
>> +        if (codePoint >= MIN_CODE_POINT) {
>> +            char[] result;
>> +            if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
>> +                result = C1.clone();
>> ----------------
>> I guess cloning is faster than new instantiation, as fresh one has to be
>> initialized by 0s, but not sure
>>     
>
> Cloning has to initialize as well, with the original contents.
> Often there is special support for zeroing newly allocated objects.
> So I think this is a bad idea.
>
>   

Sounds reasonable.
In case of more complicated objects but arrays, can cloning be faster?

>> ================
>> +    static int toSurrogates(int codePoint, char[] dst, int index) {
>> +        // We write elements "backwards" to guarantee all-or-nothing //
>> TODO: isn't forward faster ?
>> +        dst[index+1] = lowSurrogate(codePoint);
>> +        dst[index] = highSurrogate(codePoint);
>> ----------------
>> IMO this guarantee is nowhere needed, so why potentially waste performance ?
>>     
>
> All-or-nothing is a fundamental software reliability principle.
> Why have your data structure in a corrupted state
> if it is not too onerous to avoid it?
> The difference in behavior is certainly user-visible.
>   

OK, ...but if there is a speed advantage, we could note this in javadoc 
(as is just happens for invalid surrogate input in the calling method 
for the same reason).
Anyway, in current state of HotSpot optimization, there seems to be no 
advantage according Christian Thalinger.

>   
>> ================
>> -        // Pass 2: Allocate and fill in char[]
>> -        char[] v = new char[n];
>> -        for (int i = offset, j = 0; i < offset + count; i++) {
>> -            int c = codePoints[i];
>> -            if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
>> -                v[j++] = (char) c;
>> -            } else {
>> -                Character.toSurrogates(c, v, j);
>> -                j += 2;
>> -            }
>> -        }
>> -
>> -        this.value  = v;
>> +        // Pass 2: Allocate and fill in char[]
>> +        for (int i = offset, j = 0; i < offset + count; i++) {
>> +            int c = codePoints[i];
>> +            if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT)
>> +                value[j++] = (char)c;
>> +            else
>> +                j += Character.toSurrogates(c, value, j);
>>        }
>> ----------------
>> I guess HotSpot is aware of immutability of String, so no need to locally
>> buffer v expletive to save indirection and against concurrent modifications.
>>     
>
> I like the simplification of this method a lot, but it actually helps
> hotspot a little
> to hold the new array in a local and assign it to a field at the end
> of the constructor.
>   

Okay, so we have to wait for HotSpot being able to "see" that
(1) first holding the variable locally would increase speed
(2) this is thread safe, as we are in constructor, so no concurrent 
modification can happen.
(3) no method in class String allows modification of content of array 
"value" after first initialization.

* Even javac could "see" that, and do the optimization on byte code 
level. (maybe anti-confusion technique is needed for the debugger, 
...but we also accept, that debugger runs into a StringBuilder#append() 
loop on base of a _single_ char[] in case of _multiple_ string 
concatenation.)


Martin, what do you think about adding code point accessors to 
java.nio.CharBuffer (see my X-Buffer patch) ?


-Ulf


> This sort of thing (copying a field into a local) is done in
> java.util.concurrent a lot.
>
> ---
>
> It would help to separate cosmetic and "real" changes to get them
> reviewed and accepted.
>
>
> Martin
>
>
>   




More information about the core-libs-dev mailing list