RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8
Xueming Shen
xueming.shen at oracle.com
Thu Jun 15 15:09:29 UTC 2017
On 6/15/17, 6:29 AM, Claes Redestad wrote:
> Hi Sherman,
>
>>
>> One nick picking is that it might be better to initialize the
>> constant "LATIN1" from the String class
>> when intializing String_coder_ID? recently I got a "naked constants"
>> complain #JDK-8156530, so
>> I guess it might be better to do so at you change, though I don't
>> expect it's going to change in the
>> foreseeable future, even we finally add a "utf8" variants there.
>>
>> 478 static const jbyte LATIN1 = 0;
>> 479 static const jbyte UTF16 = 1;
>
> Not sure how the "naked constants" issue applies, but maybe we can
> save us some
> future surprises by adding a precondition check in the new test to
> ensure the
> constants defined in jni_util matches the constants defined in
> java.lang.String:
+1
I was thinking of initializing the LATIN1 with the value vai
GetStaticByteField as you are
doing now in the test.
But the new test is good enough for me, a warning/reminder in the future
in case we have to
change the constant values in String class.
-Sherman
>
> http://cr.openjdk.java.net/~redestad/8181147/jdk.06/
>
> Does this suffice?
>
> Thanks!
>
> /Claes
>
>>
>> thanks,
>> Sherman
>>
>> On 6/13/17, 1:27 PM, Claes Redestad wrote:
>>>
>>>
>>> On 2017-06-13 21:09, Martin Buchholz wrote:
>>>> Looks good to me, although still hoping for more review from others.
>>>
>>> I expect Sherman to weigh in. :-)
>>>
>>>>
>>>> + if (bytes != 0) {
>>>> Style: use NULL.
>>>
>>> Done.
>>>
>>>>
>>>> +static jstring newStringJava(JNIEnv *env, const char *str) {
>>>>
>>>> I expected two versions, one that took a length and one that did
>>>> not, as you did with newString8859_1.
>>>
>>> Ah, yes. Refactored and updated in-place.
>>>
>>>> ---
>>>> TIL that jbyte is signed char, not plain char, and so is more
>>>> consistent with java "byte", hence less error prone, at least in
>>>> theory.
>>>>
>>>> ---
>>>> Can we test different native environments by using
>>>> -Dsun.jnu.encoding=... ?
>>>
>>> I did try, but jtreg currently doesn't appear to support specifying
>>> flags for main/native @run:s (and helpfully suggests using
>>> main/othervm instead...) - maybe someone knows a way around this?
>>>
>>> /Claes
>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad
>>>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>>>
>>>> <adding build-dev; see
>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html
>>>>
>>>> <http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>
>>>>
>>>> for missing context>
>>>>
>>>> Updated webrevs:
>>>> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>>>> http://cr.openjdk.java.net/~redestad/8181147/top.04/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
>>>>
>>>> In this version there are various cleanups based on Martin's
>>>> comments, improved test coverage and some small changes to
>>>> ensure the updated test builds and runs on all our supported
>>>> platforms.
>>>>
>>>> If someone with access to any of the platforms not supported
>>>> by Oracle could take this for a spin and ensure linking the newly
>>>> added test against libjava works as expected (or suggest how
>>>> to make it work if it doesn't) then that'd be much appreciated.
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>>
>>>> On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>>>>
>>>> I'm hoping Xueming will review as well.
>>>> + char asciiCheck;
>>>> + for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>>> + asciiCheck |= *p;
>>>> + }
>>>> + len = (p - str);
>>>> Probably conversion from ptrdiff_t to int needs a cast.
>>>>
>>>> Someday we might need to worry about input string longer than
>>>> 2^31, but that's a pre-existing problem.
>>>>
>>>> The signed-ness of char being implementation-defined makes me
>>>> nervous.
>>>> Maybe do all the bit operations using explicit unsigned char,
>>>> then the final test becomes simply
>>>>
>>>> asciiCheck <= 0x80
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
>>>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>
>>>> <mailto:claes.redestad at oracle.com
>>>> <mailto:claes.redestad at oracle.com>>> wrote:
>>>>
>>>> Hi Martin,
>>>>
>>>> thanks for reviewing!
>>>>
>>>> On 2017-06-12 22:14, Martin Buchholz wrote:
>>>>
>>>> +/* Optimized for char set UTF-8 */
>>>>
>>>> "charset" is a (poor misnomer!) jargon term, not two
>>>> words.
>>>>
>>>>
>>>> I got that from the existing use of "char set" in this
>>>> file, but
>>>> will fix it in all places.
>>>>
>>>>
>>>> ---
>>>>
>>>> + for (b = str[len]; b != '\0'; len++, b =
>>>> str[len]) {
>>>> + if (isAscii && b & 0x80) {
>>>> + isAscii = JNI_FALSE;
>>>> + }
>>>> + }
>>>>
>>>> I would write this more like
>>>>
>>>> const signed char *p;
>>>> int isAscii;
>>>>
>>>> for (isAscii = 0, p = (const signed char *) str; *p !=
>>>> '\0';
>>>> p++) isAscii &= (*p >= 0);
>>>>
>>>>
>>>> Did you mean for isAscii to be initialized to 1 (true) and
>>>> then be
>>>> cleared to 0 (false) when *p >= 0 is false?
>>>>
>>>>
>>>> Then length is (p - str)
>>>>
>>>>
>>>> How about something like this to hoist the second
>>>> comparison from
>>>> the loop:
>>>>
>>>> int len;
>>>> char asciiCheck;
>>>> for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>>> asciiCheck |= *p;
>>>> }
>>>> len = (p - str);
>>>>
>>>> if (asciiCheck & 0x80) {
>>>> // ascii fast-path
>>>> return newSizedString8859_1(env, str, len);
>>>> }
>>>>
>>>> ...
>>>>
>>>> ---
>>>>
>>>> + jbyteArray hab = NULL;
>>>>
>>>> I'm having a hard time decoding the name "hab"
>>>>
>>>>
>>>> Not sure, but my guess is "heap allocated byteArray".
>>>>
>>>>
>>>> ---
>>>>
>>>> The code below is not UTF-8 specific. Can it be
>>>> refactored?
>>>>
>>>> + hab = (*env)->NewByteArray(env, len);
>>>> + if (hab != 0) {
>>>> + jclass strClazz = JNU_ClassString(env);
>>>> + CHECK_NULL_RETURN(strClazz, 0);
>>>> + (*env)->SetByteArrayRegion(env, hab, 0, len,
>>>> (jbyte
>>>> *)str);
>>>> + result = (*env)->NewObject(env, strClazz,
>>>> + String_init_ID, hab,
>>>> jnuEncoding);
>>>> + (*env)->DeleteLocalRef(env, hab);
>>>> + return result;
>>>> + }
>>>>
>>>>
>>>> Yes, probably. The excessive copy-paste here is due to
>>>> a crash
>>>> issue I ran into when
>>>> Charset.isSupported was called without proper
>>>> initialization. It
>>>> has since been
>>>> resolved, but seems I forgot to revisit this part.
>>>>
>>>> I also realized I haven't added a test for this method.
>>>> I'll look
>>>> into doing the
>>>> refactoring and adding some sanity testing tomorrow.
>>>>
>>>>
>>>> ---
>>>>
>>>> We probably want to use unicode escapes in out java
>>>> sources to
>>>> keep all source files strictly ASCII.
>>>>
>>>>
>>>> You mean in the test? Sure.
>>>>
>>>> Refreshed the jdk webrev:
>>>> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
>>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
>>>> more to
>>>> come.
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
More information about the build-dev
mailing list