RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Xueming Shen xueming.shen at oracle.com
Thu Jun 15 15:09:29 UTC 2017


On 6/15/17, 6:29 AM, Claes Redestad wrote:
> Hi Sherman,
>
>>
>> One nick picking is that it might be better to initialize the 
>> constant "LATIN1" from the String class
>> when intializing String_coder_ID? recently I got a "naked constants" 
>> complain #JDK-8156530, so
>> I guess it might be better to do so at you change, though I don't 
>> expect it's going to change in the
>> foreseeable future, even we finally add a "utf8" variants there.
>>
>> 478 static const jbyte LATIN1 = 0;
>> 479 static const jbyte UTF16  = 1;
>
> Not sure how the "naked constants" issue applies, but maybe we can 
> save us some
> future surprises by adding a precondition check in the new test to 
> ensure the
> constants defined in jni_util matches the constants defined in 
> java.lang.String:
+1

I was thinking of initializing the LATIN1 with the value vai 
GetStaticByteField as you are
doing now in the test.

But the new test is good enough for me, a warning/reminder in the future 
in case we have to
change the constant values in String class.

-Sherman

>
> http://cr.openjdk.java.net/~redestad/8181147/jdk.06/
>
> Does this suffice?
>
> Thanks!
>
> /Claes
>
>>
>> thanks,
>> Sherman
>>
>> On 6/13/17, 1:27 PM, Claes Redestad wrote:
>>>
>>>
>>> On 2017-06-13 21:09, Martin Buchholz wrote:
>>>> Looks good to me, although still hoping for more review from others.
>>>
>>> I expect Sherman to weigh in. :-)
>>>
>>>>
>>>> +    if (bytes != 0) {
>>>> Style: use NULL.
>>>
>>> Done.
>>>
>>>>
>>>> +static jstring newStringJava(JNIEnv *env, const char *str) {
>>>>
>>>> I expected two versions, one that took a length and one that did 
>>>> not, as you did with newString8859_1.
>>>
>>> Ah, yes. Refactored and updated in-place.
>>>
>>>> ---
>>>> TIL that jbyte is signed char, not plain char, and so is more 
>>>> consistent with java "byte", hence less error prone, at least in 
>>>> theory.
>>>>
>>>> ---
>>>> Can we test different native environments by using 
>>>> -Dsun.jnu.encoding=... ?
>>>
>>> I did try, but jtreg currently doesn't appear to support specifying 
>>> flags for main/native @run:s (and helpfully suggests using 
>>> main/othervm instead...) - maybe someone knows a way around this?
>>>
>>> /Claes
>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad 
>>>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>>>
>>>> <adding build-dev; see
>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html 
>>>>
>>>> <http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html> 
>>>>
>>>>     for missing context>
>>>>
>>>>     Updated webrevs:
>>>>     http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>>>>     http://cr.openjdk.java.net/~redestad/8181147/top.04/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
>>>>
>>>>     In this version there are various cleanups based on Martin's
>>>>     comments, improved test coverage and some small changes to
>>>>     ensure the updated test builds and runs on all our supported
>>>>     platforms.
>>>>
>>>>     If someone with access to any of the platforms not supported
>>>>     by Oracle could take this for a spin and ensure linking the newly
>>>>     added test against libjava works as expected (or suggest how
>>>>     to make it work if it doesn't) then that'd be much appreciated.
>>>>
>>>>     Thanks!
>>>>
>>>>     /Claes
>>>>
>>>>     On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>>>>
>>>>         I'm hoping Xueming will review as well.
>>>>         +    char asciiCheck;
>>>>         +    for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>>>         +        asciiCheck |= *p;
>>>>         +    }
>>>>         +    len = (p - str);
>>>>         Probably conversion from ptrdiff_t to int needs a cast.
>>>>
>>>>         Someday we might need to worry about input string longer than
>>>>         2^31, but that's a pre-existing problem.
>>>>
>>>>         The signed-ness of char being implementation-defined makes me
>>>>         nervous.
>>>>         Maybe do all the bit operations using explicit unsigned char,
>>>>         then the final test becomes simply
>>>>
>>>>         asciiCheck <= 0x80
>>>>
>>>>
>>>>         On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
>>>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>
>>>> <mailto:claes.redestad at oracle.com
>>>> <mailto:claes.redestad at oracle.com>>> wrote:
>>>>
>>>>             Hi Martin,
>>>>
>>>>             thanks for reviewing!
>>>>
>>>>             On 2017-06-12 22:14, Martin Buchholz wrote:
>>>>
>>>>                 +/* Optimized for char set UTF-8 */
>>>>
>>>>                 "charset" is a (poor misnomer!) jargon term, not two
>>>>         words.
>>>>
>>>>
>>>>             I got that from the existing use of "char set" in this
>>>>         file, but
>>>>             will fix it in all places.
>>>>
>>>>
>>>>                 ---
>>>>
>>>>                 +    for (b = str[len]; b != '\0'; len++, b = 
>>>> str[len]) {
>>>>                 +        if (isAscii && b & 0x80) {
>>>>                 +            isAscii = JNI_FALSE;
>>>>                 +        }
>>>>                 +    }
>>>>
>>>>                 I would write this more like
>>>>
>>>>                 const signed char *p;
>>>>                 int isAscii;
>>>>
>>>>                 for (isAscii = 0, p = (const signed char *) str; *p !=
>>>>         '\0';
>>>>                 p++) isAscii &= (*p >= 0);
>>>>
>>>>
>>>>             Did you mean for isAscii to be initialized to 1 (true) and
>>>>         then be
>>>>             cleared to 0 (false) when *p >= 0 is false?
>>>>
>>>>
>>>>                 Then length is (p - str)
>>>>
>>>>
>>>>             How about something like this to hoist the second
>>>>         comparison from
>>>>             the loop:
>>>>
>>>>                 int len;
>>>>                 char asciiCheck;
>>>>                 for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>>>                     asciiCheck |= *p;
>>>>                 }
>>>>                 len = (p - str);
>>>>
>>>>                 if (asciiCheck & 0x80) {
>>>>                     // ascii fast-path
>>>>                     return newSizedString8859_1(env, str, len);
>>>>                 }
>>>>
>>>>                 ...
>>>>
>>>>                 ---
>>>>
>>>>                 +    jbyteArray hab = NULL;
>>>>
>>>>                 I'm having a hard time decoding the name "hab"
>>>>
>>>>
>>>>             Not sure, but my guess is "heap allocated byteArray".
>>>>
>>>>
>>>>                 ---
>>>>
>>>>                 The code below is not UTF-8 specific. Can it be
>>>>         refactored?
>>>>
>>>>                 +    hab = (*env)->NewByteArray(env, len);
>>>>                 +    if (hab != 0) {
>>>>                 +        jclass strClazz = JNU_ClassString(env);
>>>>                 +        CHECK_NULL_RETURN(strClazz, 0);
>>>>                 +        (*env)->SetByteArrayRegion(env, hab, 0, len,
>>>>         (jbyte
>>>>                 *)str);
>>>>                 +        result = (*env)->NewObject(env, strClazz,
>>>>                 +  String_init_ID, hab,
>>>>                 jnuEncoding);
>>>>                 +        (*env)->DeleteLocalRef(env, hab);
>>>>                 +        return result;
>>>>                 +    }
>>>>
>>>>
>>>>             Yes, probably. The excessive copy-paste here is due to 
>>>> a crash
>>>>             issue I ran into when
>>>>             Charset.isSupported was called without proper
>>>>         initialization. It
>>>>             has since been
>>>>             resolved, but seems I forgot to revisit this part.
>>>>
>>>>             I also realized I haven't added a test for this method.
>>>>         I'll look
>>>>             into doing the
>>>>             refactoring and adding some sanity testing tomorrow.
>>>>
>>>>
>>>>                 ---
>>>>
>>>>                 We probably want to use unicode escapes in out java
>>>>         sources to
>>>>                 keep all source files strictly ASCII.
>>>>
>>>>
>>>>             You mean in the test? Sure.
>>>>
>>>>             Refreshed the jdk webrev:
>>>>         http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
>>>>         more to
>>>>             come.
>>>>
>>>>             Thanks!
>>>>
>>>>             /Claes
>>>>
>>>>
>>>>
>>>>
>>>
>>
>




More information about the build-dev mailing list