RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Tue Jun 13 20:27:02 UTC 2017

On 2017-06-13 21:09, Martin Buchholz wrote:
> Looks good to me, although still hoping for more review from others.

I expect Sherman to weigh in. :-)

>
> +    if (bytes != 0) {
> Style: use NULL.

Done.

>
> +static jstring newStringJava(JNIEnv *env, const char *str) {
>
> I expected two versions, one that took a length and one that did not, 
> as you did with newString8859_1.

Ah, yes. Refactored and updated in-place.

> ---
> TIL that jbyte is signed char, not plain char, and so is more 
> consistent with java "byte", hence less error prone, at least in theory.
>
> ---
> Can we test different native environments by using 
> -Dsun.jnu.encoding=... ?

I did try, but jtreg currently doesn't appear to support specifying 
flags for main/native @run:s (and helpfully suggests using main/othervm 
instead...) - maybe someone knows a way around this?

/Claes

>
>
>
> On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad 
> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>
>     <adding build-dev; see
>     http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html
>     <http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>
>     for missing context>
>
>     Updated webrevs:
>     http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>     <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>     http://cr.openjdk.java.net/~redestad/8181147/top.04/
>     <http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
>
>     In this version there are various cleanups based on Martin's
>     comments, improved test coverage and some small changes to
>     ensure the updated test builds and runs on all our supported
>     platforms.
>
>     If someone with access to any of the platforms not supported
>     by Oracle could take this for a spin and ensure linking the newly
>     added test against libjava works as expected (or suggest how
>     to make it work if it doesn't) then that'd be much appreciated.
>
>     Thanks!
>
>     /Claes
>
>     On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>
>         I'm hoping Xueming will review as well.
>         +    char asciiCheck;
>         +    for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>         +        asciiCheck |= *p;
>         +    }
>         +    len = (p - str);
>         Probably conversion from ptrdiff_t to int needs a cast.
>
>         Someday we might need to worry about input string longer than
>         2^31, but that's a pre-existing problem.
>
>         The signed-ness of char being implementation-defined makes me
>         nervous.
>         Maybe do all the bit operations using explicit unsigned char,
>         then the final test becomes simply
>
>         asciiCheck <= 0x80
>
>
>         On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
>         <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>
>         <mailto:claes.redestad at oracle.com
>         <mailto:claes.redestad at oracle.com>>> wrote:
>
>             Hi Martin,
>
>             thanks for reviewing!
>
>             On 2017-06-12 22:14, Martin Buchholz wrote:
>
>                 +/* Optimized for char set UTF-8 */
>
>                 "charset" is a (poor misnomer!) jargon term, not two
>         words.
>
>
>             I got that from the existing use of "char set" in this
>         file, but
>             will fix it in all places.
>
>
>                 ---
>
>                 +    for (b = str[len]; b != '\0'; len++, b = str[len]) {
>                 +        if (isAscii && b & 0x80) {
>                 +            isAscii = JNI_FALSE;
>                 +        }
>                 +    }
>
>                 I would write this more like
>
>                 const signed char *p;
>                 int isAscii;
>
>                 for (isAscii = 0, p = (const signed char *) str; *p !=
>         '\0';
>                 p++) isAscii &= (*p >= 0);
>
>
>             Did you mean for isAscii to be initialized to 1 (true) and
>         then be
>             cleared to 0 (false) when *p >= 0 is false?
>
>
>                 Then length is (p - str)
>
>
>             How about something like this to hoist the second
>         comparison from
>             the loop:
>
>                 int len;
>                 char asciiCheck;
>                 for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>                     asciiCheck |= *p;
>                 }
>                 len = (p - str);
>
>                 if (asciiCheck & 0x80) {
>                     // ascii fast-path
>                     return newSizedString8859_1(env, str, len);
>                 }
>
>                 ...
>
>                 ---
>
>                 +    jbyteArray hab = NULL;
>
>                 I'm having a hard time decoding the name "hab"
>
>
>             Not sure, but my guess is "heap allocated byteArray".
>
>
>                 ---
>
>                 The code below is not UTF-8 specific. Can it be
>         refactored?
>
>                 +    hab = (*env)->NewByteArray(env, len);
>                 +    if (hab != 0) {
>                 +        jclass strClazz = JNU_ClassString(env);
>                 +        CHECK_NULL_RETURN(strClazz, 0);
>                 +        (*env)->SetByteArrayRegion(env, hab, 0, len,
>         (jbyte
>                 *)str);
>                 +        result = (*env)->NewObject(env, strClazz,
>                 +  String_init_ID, hab,
>                 jnuEncoding);
>                 +        (*env)->DeleteLocalRef(env, hab);
>                 +        return result;
>                 +    }
>
>
>             Yes, probably. The excessive copy-paste here is due to a crash
>             issue I ran into when
>             Charset.isSupported was called without proper
>         initialization. It
>             has since been
>             resolved, but seems I forgot to revisit this part.
>
>             I also realized I haven't added a test for this method.
>         I'll look
>             into doing the
>             refactoring and adding some sanity testing tomorrow.
>
>
>                 ---
>
>                 We probably want to use unicode escapes in out java
>         sources to
>                 keep all source files strictly ASCII.
>
>
>             You mean in the test? Sure.
>
>             Refreshed the jdk webrev:
>         http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>         <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>             <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
>         <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
>         more to
>             come.
>
>             Thanks!
>
>             /Claes
>
>
>
>