RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Mon Jun 12 23:36:13 UTC 2017

Hi Martin,

thanks for reviewing!

On 2017-06-12 22:14, Martin Buchholz wrote:
> +/* Optimized for char set UTF-8 */
>
> "charset" is a (poor misnomer!) jargon term, not two words.

I got that from the existing use of "char set" in this file, but will 
fix it in all places.

>
> ---
>
> +    for (b = str[len]; b != '\0'; len++, b = str[len]) {
> +        if (isAscii && b & 0x80) {
> +            isAscii = JNI_FALSE;
> +        }
> +    }
>
> I would write this more like
>
> const signed char *p;
> int isAscii;
>
> for (isAscii = 0, p = (const signed char *) str; *p != '\0'; p++) 
> isAscii &= (*p >= 0);

Did you mean for isAscii to be initialized to 1 (true) and then be 
cleared to 0 (false) when *p >= 0 is false?

>
> Then length is (p - str)

How about something like this to hoist the second comparison from the loop:

     int len;
     char asciiCheck;
     for (asciiCheck = 0, p = str; *p != '\0'; p++) {
         asciiCheck |= *p;
     }
     len = (p - str);

     if (asciiCheck & 0x80) {
         // ascii fast-path
         return newSizedString8859_1(env, str, len);
     }

     ...

> ---
>
> +    jbyteArray hab = NULL;
>
> I'm having a hard time decoding the name "hab"

Not sure, but my guess is "heap allocated byteArray".

>
> ---
>
> The code below is not UTF-8 specific. Can it be refactored?
>
> +    hab = (*env)->NewByteArray(env, len);
> +    if (hab != 0) {
> +        jclass strClazz = JNU_ClassString(env);
> +        CHECK_NULL_RETURN(strClazz, 0);
> +        (*env)->SetByteArrayRegion(env, hab, 0, len, (jbyte *)str);
> +        result = (*env)->NewObject(env, strClazz,
> +                                       String_init_ID, hab, jnuEncoding);
> +        (*env)->DeleteLocalRef(env, hab);
> +        return result;
> +    }

Yes, probably. The excessive copy-paste here is due to a crash issue I 
ran into when
Charset.isSupported was called without proper initialization. It has 
since been
resolved, but seems I forgot to revisit this part.

I also realized I haven't added a test for this method. I'll look into 
doing the
refactoring and adding some sanity testing tomorrow.

>
> ---
>
> We probably want to use unicode escapes in out java sources to keep 
> all source files strictly ASCII.

You mean in the test? Sure.

Refreshed the jdk webrev: 
http://cr.openjdk.java.net/~redestad/8181147/jdk.05/ - more to come.

Thanks!

/Claes