RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Tue Jun 13 14:03:31 UTC 2017

<adding build-dev; see 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html 
for missing context>

Updated webrevs:
http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
http://cr.openjdk.java.net/~redestad/8181147/top.04/

In this version there are various cleanups based on Martin's
comments, improved test coverage and some small changes to
ensure the updated test builds and runs on all our supported
platforms.

If someone with access to any of the platforms not supported
by Oracle could take this for a spin and ensure linking the newly
added test against libjava works as expected (or suggest how
to make it work if it doesn't) then that'd be much appreciated.

Thanks!

/Claes

On 06/13/2017 02:01 AM, Martin Buchholz wrote:
> I'm hoping Xueming will review as well.
> +    char asciiCheck;
> +    for (asciiCheck = 0, p = str; *p != '\0'; p++) {
> +        asciiCheck |= *p;
> +    }
> +    len = (p - str);
> Probably conversion from ptrdiff_t to int needs a cast.
>
> Someday we might need to worry about input string longer than 2^31, 
> but that's a pre-existing problem.
>
> The signed-ness of char being implementation-defined makes me nervous.
> Maybe do all the bit operations using explicit unsigned char, then the 
> final test becomes simply
>
> asciiCheck <= 0x80
>
>
> On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad 
> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>
>     Hi Martin,
>
>     thanks for reviewing!
>
>     On 2017-06-12 22:14, Martin Buchholz wrote:
>
>         +/* Optimized for char set UTF-8 */
>
>         "charset" is a (poor misnomer!) jargon term, not two words.
>
>
>     I got that from the existing use of "char set" in this file, but
>     will fix it in all places.
>
>
>         ---
>
>         +    for (b = str[len]; b != '\0'; len++, b = str[len]) {
>         +        if (isAscii && b & 0x80) {
>         +            isAscii = JNI_FALSE;
>         +        }
>         +    }
>
>         I would write this more like
>
>         const signed char *p;
>         int isAscii;
>
>         for (isAscii = 0, p = (const signed char *) str; *p != '\0';
>         p++) isAscii &= (*p >= 0);
>
>
>     Did you mean for isAscii to be initialized to 1 (true) and then be
>     cleared to 0 (false) when *p >= 0 is false?
>
>
>         Then length is (p - str)
>
>
>     How about something like this to hoist the second comparison from
>     the loop:
>
>         int len;
>         char asciiCheck;
>         for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>             asciiCheck |= *p;
>         }
>         len = (p - str);
>
>         if (asciiCheck & 0x80) {
>             // ascii fast-path
>             return newSizedString8859_1(env, str, len);
>         }
>
>         ...
>
>         ---
>
>         +    jbyteArray hab = NULL;
>
>         I'm having a hard time decoding the name "hab"
>
>
>     Not sure, but my guess is "heap allocated byteArray".
>
>
>         ---
>
>         The code below is not UTF-8 specific. Can it be refactored?
>
>         +    hab = (*env)->NewByteArray(env, len);
>         +    if (hab != 0) {
>         +        jclass strClazz = JNU_ClassString(env);
>         +        CHECK_NULL_RETURN(strClazz, 0);
>         +        (*env)->SetByteArrayRegion(env, hab, 0, len, (jbyte
>         *)str);
>         +        result = (*env)->NewObject(env, strClazz,
>         +                                       String_init_ID, hab,
>         jnuEncoding);
>         +        (*env)->DeleteLocalRef(env, hab);
>         +        return result;
>         +    }
>
>
>     Yes, probably. The excessive copy-paste here is due to a crash
>     issue I ran into when
>     Charset.isSupported was called without proper initialization. It
>     has since been
>     resolved, but seems I forgot to revisit this part.
>
>     I also realized I haven't added a test for this method. I'll look
>     into doing the
>     refactoring and adding some sanity testing tomorrow.
>
>
>         ---
>
>         We probably want to use unicode escapes in out java sources to
>         keep all source files strictly ASCII.
>
>
>     You mean in the test? Sure.
>
>     Refreshed the jdk webrev:
>     http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>     <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/> - more to
>     come.
>
>     Thanks!
>
>     /Claes
>
>