RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Thu Jun 15 03:27:34 UTC 2017

Hi Claes,

The change looks fine. Yes, encoding the 2-byete latin1 at native looks 
reasonable.

One nick picking is that it might be better to initialize the constant 
"LATIN1" from the String class
when intializing String_coder_ID? recently I got a "naked constants" 
complain #JDK-8156530, so
I guess it might be better to do so at you change, though I don't expect 
it's going to change in the
foreseeable future, even we finally add a "utf8" variants there.

478 static const jbyte LATIN1 = 0;
479 static const jbyte UTF16  = 1;

thanks,
Sherman

On 6/13/17, 1:27 PM, Claes Redestad wrote:
>
>
> On 2017-06-13 21:09, Martin Buchholz wrote:
>> Looks good to me, although still hoping for more review from others.
>
> I expect Sherman to weigh in. :-)
>
>>
>> +    if (bytes != 0) {
>> Style: use NULL.
>
> Done.
>
>>
>> +static jstring newStringJava(JNIEnv *env, const char *str) {
>>
>> I expected two versions, one that took a length and one that did not, 
>> as you did with newString8859_1.
>
> Ah, yes. Refactored and updated in-place.
>
>> ---
>> TIL that jbyte is signed char, not plain char, and so is more 
>> consistent with java "byte", hence less error prone, at least in theory.
>>
>> ---
>> Can we test different native environments by using 
>> -Dsun.jnu.encoding=... ?
>
> I did try, but jtreg currently doesn't appear to support specifying 
> flags for main/native @run:s (and helpfully suggests using 
> main/othervm instead...) - maybe someone knows a way around this?
>
> /Claes
>
>>
>>
>>
>> On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad 
>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>
>> <adding build-dev; see
>>     
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html
>> <http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>
>>     for missing context>
>>
>>     Updated webrevs:
>>     http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>>     http://cr.openjdk.java.net/~redestad/8181147/top.04/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
>>
>>     In this version there are various cleanups based on Martin's
>>     comments, improved test coverage and some small changes to
>>     ensure the updated test builds and runs on all our supported
>>     platforms.
>>
>>     If someone with access to any of the platforms not supported
>>     by Oracle could take this for a spin and ensure linking the newly
>>     added test against libjava works as expected (or suggest how
>>     to make it work if it doesn't) then that'd be much appreciated.
>>
>>     Thanks!
>>
>>     /Claes
>>
>>     On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>>
>>         I'm hoping Xueming will review as well.
>>         +    char asciiCheck;
>>         +    for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>         +        asciiCheck |= *p;
>>         +    }
>>         +    len = (p - str);
>>         Probably conversion from ptrdiff_t to int needs a cast.
>>
>>         Someday we might need to worry about input string longer than
>>         2^31, but that's a pre-existing problem.
>>
>>         The signed-ness of char being implementation-defined makes me
>>         nervous.
>>         Maybe do all the bit operations using explicit unsigned char,
>>         then the final test becomes simply
>>
>>         asciiCheck <= 0x80
>>
>>
>>         On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>
>> <mailto:claes.redestad at oracle.com
>> <mailto:claes.redestad at oracle.com>>> wrote:
>>
>>             Hi Martin,
>>
>>             thanks for reviewing!
>>
>>             On 2017-06-12 22:14, Martin Buchholz wrote:
>>
>>                 +/* Optimized for char set UTF-8 */
>>
>>                 "charset" is a (poor misnomer!) jargon term, not two
>>         words.
>>
>>
>>             I got that from the existing use of "char set" in this
>>         file, but
>>             will fix it in all places.
>>
>>
>>                 ---
>>
>>                 +    for (b = str[len]; b != '\0'; len++, b = 
>> str[len]) {
>>                 +        if (isAscii && b & 0x80) {
>>                 +            isAscii = JNI_FALSE;
>>                 +        }
>>                 +    }
>>
>>                 I would write this more like
>>
>>                 const signed char *p;
>>                 int isAscii;
>>
>>                 for (isAscii = 0, p = (const signed char *) str; *p !=
>>         '\0';
>>                 p++) isAscii &= (*p >= 0);
>>
>>
>>             Did you mean for isAscii to be initialized to 1 (true) and
>>         then be
>>             cleared to 0 (false) when *p >= 0 is false?
>>
>>
>>                 Then length is (p - str)
>>
>>
>>             How about something like this to hoist the second
>>         comparison from
>>             the loop:
>>
>>                 int len;
>>                 char asciiCheck;
>>                 for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>                     asciiCheck |= *p;
>>                 }
>>                 len = (p - str);
>>
>>                 if (asciiCheck & 0x80) {
>>                     // ascii fast-path
>>                     return newSizedString8859_1(env, str, len);
>>                 }
>>
>>                 ...
>>
>>                 ---
>>
>>                 +    jbyteArray hab = NULL;
>>
>>                 I'm having a hard time decoding the name "hab"
>>
>>
>>             Not sure, but my guess is "heap allocated byteArray".
>>
>>
>>                 ---
>>
>>                 The code below is not UTF-8 specific. Can it be
>>         refactored?
>>
>>                 +    hab = (*env)->NewByteArray(env, len);
>>                 +    if (hab != 0) {
>>                 +        jclass strClazz = JNU_ClassString(env);
>>                 +        CHECK_NULL_RETURN(strClazz, 0);
>>                 +        (*env)->SetByteArrayRegion(env, hab, 0, len,
>>         (jbyte
>>                 *)str);
>>                 +        result = (*env)->NewObject(env, strClazz,
>>                 +  String_init_ID, hab,
>>                 jnuEncoding);
>>                 +        (*env)->DeleteLocalRef(env, hab);
>>                 +        return result;
>>                 +    }
>>
>>
>>             Yes, probably. The excessive copy-paste here is due to a 
>> crash
>>             issue I ran into when
>>             Charset.isSupported was called without proper
>>         initialization. It
>>             has since been
>>             resolved, but seems I forgot to revisit this part.
>>
>>             I also realized I haven't added a test for this method.
>>         I'll look
>>             into doing the
>>             refactoring and adding some sanity testing tomorrow.
>>
>>
>>                 ---
>>
>>                 We probably want to use unicode escapes in out java
>>         sources to
>>                 keep all source files strictly ASCII.
>>
>>
>>             You mean in the test? Sure.
>>
>>             Refreshed the jdk webrev:
>>         http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
>>         more to
>>             come.
>>
>>             Thanks!
>>
>>             /Claes
>>
>>
>>
>>
>