RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8
Claes Redestad
claes.redestad at oracle.com
Thu Jun 15 13:29:42 UTC 2017
Hi Sherman,
On 06/15/2017 05:27 AM, Xueming Shen wrote:
> Hi Claes,
>
> The change looks fine. Yes, encoding the 2-byete latin1 at native
> looks reasonable.
Thanks for reviewing!
>
> One nick picking is that it might be better to initialize the constant
> "LATIN1" from the String class
> when intializing String_coder_ID? recently I got a "naked constants"
> complain #JDK-8156530, so
> I guess it might be better to do so at you change, though I don't
> expect it's going to change in the
> foreseeable future, even we finally add a "utf8" variants there.
>
> 478 static const jbyte LATIN1 = 0;
> 479 static const jbyte UTF16 = 1;
Not sure how the "naked constants" issue applies, but maybe we can save
us some
future surprises by adding a precondition check in the new test to
ensure the
constants defined in jni_util matches the constants defined in
java.lang.String:
http://cr.openjdk.java.net/~redestad/8181147/jdk.06/
Does this suffice?
Thanks!
/Claes
>
> thanks,
> Sherman
>
> On 6/13/17, 1:27 PM, Claes Redestad wrote:
>>
>>
>> On 2017-06-13 21:09, Martin Buchholz wrote:
>>> Looks good to me, although still hoping for more review from others.
>>
>> I expect Sherman to weigh in. :-)
>>
>>>
>>> + if (bytes != 0) {
>>> Style: use NULL.
>>
>> Done.
>>
>>>
>>> +static jstring newStringJava(JNIEnv *env, const char *str) {
>>>
>>> I expected two versions, one that took a length and one that did
>>> not, as you did with newString8859_1.
>>
>> Ah, yes. Refactored and updated in-place.
>>
>>> ---
>>> TIL that jbyte is signed char, not plain char, and so is more
>>> consistent with java "byte", hence less error prone, at least in
>>> theory.
>>>
>>> ---
>>> Can we test different native environments by using
>>> -Dsun.jnu.encoding=... ?
>>
>> I did try, but jtreg currently doesn't appear to support specifying
>> flags for main/native @run:s (and helpfully suggests using
>> main/othervm instead...) - maybe someone knows a way around this?
>>
>> /Claes
>>
>>>
>>>
>>>
>>> On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad
>>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>>
>>> <adding build-dev; see
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html
>>> <http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>
>>>
>>> for missing context>
>>>
>>> Updated webrevs:
>>> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>>> http://cr.openjdk.java.net/~redestad/8181147/top.04/
>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
>>>
>>> In this version there are various cleanups based on Martin's
>>> comments, improved test coverage and some small changes to
>>> ensure the updated test builds and runs on all our supported
>>> platforms.
>>>
>>> If someone with access to any of the platforms not supported
>>> by Oracle could take this for a spin and ensure linking the newly
>>> added test against libjava works as expected (or suggest how
>>> to make it work if it doesn't) then that'd be much appreciated.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>>>
>>> I'm hoping Xueming will review as well.
>>> + char asciiCheck;
>>> + for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>> + asciiCheck |= *p;
>>> + }
>>> + len = (p - str);
>>> Probably conversion from ptrdiff_t to int needs a cast.
>>>
>>> Someday we might need to worry about input string longer than
>>> 2^31, but that's a pre-existing problem.
>>>
>>> The signed-ness of char being implementation-defined makes me
>>> nervous.
>>> Maybe do all the bit operations using explicit unsigned char,
>>> then the final test becomes simply
>>>
>>> asciiCheck <= 0x80
>>>
>>>
>>> On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
>>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>
>>> <mailto:claes.redestad at oracle.com
>>> <mailto:claes.redestad at oracle.com>>> wrote:
>>>
>>> Hi Martin,
>>>
>>> thanks for reviewing!
>>>
>>> On 2017-06-12 22:14, Martin Buchholz wrote:
>>>
>>> +/* Optimized for char set UTF-8 */
>>>
>>> "charset" is a (poor misnomer!) jargon term, not two
>>> words.
>>>
>>>
>>> I got that from the existing use of "char set" in this
>>> file, but
>>> will fix it in all places.
>>>
>>>
>>> ---
>>>
>>> + for (b = str[len]; b != '\0'; len++, b =
>>> str[len]) {
>>> + if (isAscii && b & 0x80) {
>>> + isAscii = JNI_FALSE;
>>> + }
>>> + }
>>>
>>> I would write this more like
>>>
>>> const signed char *p;
>>> int isAscii;
>>>
>>> for (isAscii = 0, p = (const signed char *) str; *p !=
>>> '\0';
>>> p++) isAscii &= (*p >= 0);
>>>
>>>
>>> Did you mean for isAscii to be initialized to 1 (true) and
>>> then be
>>> cleared to 0 (false) when *p >= 0 is false?
>>>
>>>
>>> Then length is (p - str)
>>>
>>>
>>> How about something like this to hoist the second
>>> comparison from
>>> the loop:
>>>
>>> int len;
>>> char asciiCheck;
>>> for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>>> asciiCheck |= *p;
>>> }
>>> len = (p - str);
>>>
>>> if (asciiCheck & 0x80) {
>>> // ascii fast-path
>>> return newSizedString8859_1(env, str, len);
>>> }
>>>
>>> ...
>>>
>>> ---
>>>
>>> + jbyteArray hab = NULL;
>>>
>>> I'm having a hard time decoding the name "hab"
>>>
>>>
>>> Not sure, but my guess is "heap allocated byteArray".
>>>
>>>
>>> ---
>>>
>>> The code below is not UTF-8 specific. Can it be
>>> refactored?
>>>
>>> + hab = (*env)->NewByteArray(env, len);
>>> + if (hab != 0) {
>>> + jclass strClazz = JNU_ClassString(env);
>>> + CHECK_NULL_RETURN(strClazz, 0);
>>> + (*env)->SetByteArrayRegion(env, hab, 0, len,
>>> (jbyte
>>> *)str);
>>> + result = (*env)->NewObject(env, strClazz,
>>> + String_init_ID, hab,
>>> jnuEncoding);
>>> + (*env)->DeleteLocalRef(env, hab);
>>> + return result;
>>> + }
>>>
>>>
>>> Yes, probably. The excessive copy-paste here is due to a
>>> crash
>>> issue I ran into when
>>> Charset.isSupported was called without proper
>>> initialization. It
>>> has since been
>>> resolved, but seems I forgot to revisit this part.
>>>
>>> I also realized I haven't added a test for this method.
>>> I'll look
>>> into doing the
>>> refactoring and adding some sanity testing tomorrow.
>>>
>>>
>>> ---
>>>
>>> We probably want to use unicode escapes in out java
>>> sources to
>>> keep all source files strictly ASCII.
>>>
>>>
>>> You mean in the test? Sure.
>>>
>>> Refreshed the jdk webrev:
>>> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
>>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
>>> more to
>>> come.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>>
>>>
>>>
>>
>
More information about the build-dev
mailing list