RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Tue Jun 13 19:09:12 UTC 2017

Looks good to me, although still hoping for more review from others.

+    if (bytes != 0) {

Style: use NULL.

+static jstring newStringJava(JNIEnv *env, const char *str) {

I expected two versions, one that took a length and one that did not, as
you did with newString8859_1.

---

TIL that jbyte is signed char, not plain char, and so is more consistent
with java "byte", hence less error prone, at least in theory.

---

Can we test different native environments by using -Dsun.jnu.encoding=... ?

On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad <claes.redestad at oracle.com>
wrote:

> <adding build-dev; see http://mail.openjdk.java.net/p
> ipermail/core-libs-dev/2017-June/048233.html for missing context>
>
> Updated webrevs:
> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
> http://cr.openjdk.java.net/~redestad/8181147/top.04/
>
> In this version there are various cleanups based on Martin's
> comments, improved test coverage and some small changes to
> ensure the updated test builds and runs on all our supported
> platforms.
>
> If someone with access to any of the platforms not supported
> by Oracle could take this for a spin and ensure linking the newly
> added test against libjava works as expected (or suggest how
> to make it work if it doesn't) then that'd be much appreciated.
>
> Thanks!
>
> /Claes
>
> On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>
>> I'm hoping Xueming will review as well.
>> +    char asciiCheck;
>> +    for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>> +        asciiCheck |= *p;
>> +    }
>> +    len = (p - str);
>> Probably conversion from ptrdiff_t to int needs a cast.
>>
>> Someday we might need to worry about input string longer than 2^31, but
>> that's a pre-existing problem.
>>
>> The signed-ness of char being implementation-defined makes me nervous.
>> Maybe do all the bit operations using explicit unsigned char, then the
>> final test becomes simply
>>
>> asciiCheck <= 0x80
>>
>>
>> On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad <
>> claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>
>>     Hi Martin,
>>
>>     thanks for reviewing!
>>
>>     On 2017-06-12 22:14, Martin Buchholz wrote:
>>
>>         +/* Optimized for char set UTF-8 */
>>
>>         "charset" is a (poor misnomer!) jargon term, not two words.
>>
>>
>>     I got that from the existing use of "char set" in this file, but
>>     will fix it in all places.
>>
>>
>>         ---
>>
>>         +    for (b = str[len]; b != '\0'; len++, b = str[len]) {
>>         +        if (isAscii && b & 0x80) {
>>         +            isAscii = JNI_FALSE;
>>         +        }
>>         +    }
>>
>>         I would write this more like
>>
>>         const signed char *p;
>>         int isAscii;
>>
>>         for (isAscii = 0, p = (const signed char *) str; *p != '\0';
>>         p++) isAscii &= (*p >= 0);
>>
>>
>>     Did you mean for isAscii to be initialized to 1 (true) and then be
>>     cleared to 0 (false) when *p >= 0 is false?
>>
>>
>>         Then length is (p - str)
>>
>>
>>     How about something like this to hoist the second comparison from
>>     the loop:
>>
>>         int len;
>>         char asciiCheck;
>>         for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>             asciiCheck |= *p;
>>         }
>>         len = (p - str);
>>
>>         if (asciiCheck & 0x80) {
>>             // ascii fast-path
>>             return newSizedString8859_1(env, str, len);
>>         }
>>
>>         ...
>>
>>         ---
>>
>>         +    jbyteArray hab = NULL;
>>
>>         I'm having a hard time decoding the name "hab"
>>
>>
>>     Not sure, but my guess is "heap allocated byteArray".
>>
>>
>>         ---
>>
>>         The code below is not UTF-8 specific. Can it be refactored?
>>
>>         +    hab = (*env)->NewByteArray(env, len);
>>         +    if (hab != 0) {
>>         +        jclass strClazz = JNU_ClassString(env);
>>         +        CHECK_NULL_RETURN(strClazz, 0);
>>         +        (*env)->SetByteArrayRegion(env, hab, 0, len, (jbyte
>>         *)str);
>>         +        result = (*env)->NewObject(env, strClazz,
>>         +                                       String_init_ID, hab,
>>         jnuEncoding);
>>         +        (*env)->DeleteLocalRef(env, hab);
>>         +        return result;
>>         +    }
>>
>>
>>     Yes, probably. The excessive copy-paste here is due to a crash
>>     issue I ran into when
>>     Charset.isSupported was called without proper initialization. It
>>     has since been
>>     resolved, but seems I forgot to revisit this part.
>>
>>     I also realized I haven't added a test for this method. I'll look
>>     into doing the
>>     refactoring and adding some sanity testing tomorrow.
>>
>>
>>         ---
>>
>>         We probably want to use unicode escapes in out java sources to
>>         keep all source files strictly ASCII.
>>
>>
>>     You mean in the test? Sure.
>>
>>     Refreshed the jdk webrev:
>>     http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>>     <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/> - more to
>>     come.
>>
>>     Thanks!
>>
>>     /Claes
>>
>>
>>
>