RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8
Erik Joelsson
erik.joelsson at oracle.com
Wed Jun 14 07:11:45 UTC 2017
Build changes look good to me.
/Erik
On 2017-06-13 22:27, Claes Redestad wrote:
>
>
> On 2017-06-13 21:09, Martin Buchholz wrote:
>> Looks good to me, although still hoping for more review from others.
>
> I expect Sherman to weigh in. :-)
>
>>
>> + if (bytes != 0) {
>> Style: use NULL.
>
> Done.
>
>>
>> +static jstring newStringJava(JNIEnv *env, const char *str) {
>>
>> I expected two versions, one that took a length and one that did not,
>> as you did with newString8859_1.
>
> Ah, yes. Refactored and updated in-place.
>
>> ---
>> TIL that jbyte is signed char, not plain char, and so is more
>> consistent with java "byte", hence less error prone, at least in theory.
>>
>> ---
>> Can we test different native environments by using
>> -Dsun.jnu.encoding=... ?
>
> I did try, but jtreg currently doesn't appear to support specifying
> flags for main/native @run:s (and helpfully suggests using
> main/othervm instead...) - maybe someone knows a way around this?
>
> /Claes
>
>>
>>
>>
>> On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad
>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>
>> <adding build-dev; see
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html
>> <http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>
>> for missing context>
>>
>> Updated webrevs:
>> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>> http://cr.openjdk.java.net/~redestad/8181147/top.04/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
>>
>> In this version there are various cleanups based on Martin's
>> comments, improved test coverage and some small changes to
>> ensure the updated test builds and runs on all our supported
>> platforms.
>>
>> If someone with access to any of the platforms not supported
>> by Oracle could take this for a spin and ensure linking the newly
>> added test against libjava works as expected (or suggest how
>> to make it work if it doesn't) then that'd be much appreciated.
>>
>> Thanks!
>>
>> /Claes
>>
>> On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>>
>> I'm hoping Xueming will review as well.
>> + char asciiCheck;
>> + for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>> + asciiCheck |= *p;
>> + }
>> + len = (p - str);
>> Probably conversion from ptrdiff_t to int needs a cast.
>>
>> Someday we might need to worry about input string longer than
>> 2^31, but that's a pre-existing problem.
>>
>> The signed-ness of char being implementation-defined makes me
>> nervous.
>> Maybe do all the bit operations using explicit unsigned char,
>> then the final test becomes simply
>>
>> asciiCheck <= 0x80
>>
>>
>> On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>
>> <mailto:claes.redestad at oracle.com
>> <mailto:claes.redestad at oracle.com>>> wrote:
>>
>> Hi Martin,
>>
>> thanks for reviewing!
>>
>> On 2017-06-12 22:14, Martin Buchholz wrote:
>>
>> +/* Optimized for char set UTF-8 */
>>
>> "charset" is a (poor misnomer!) jargon term, not two
>> words.
>>
>>
>> I got that from the existing use of "char set" in this
>> file, but
>> will fix it in all places.
>>
>>
>> ---
>>
>> + for (b = str[len]; b != '\0'; len++, b =
>> str[len]) {
>> + if (isAscii && b & 0x80) {
>> + isAscii = JNI_FALSE;
>> + }
>> + }
>>
>> I would write this more like
>>
>> const signed char *p;
>> int isAscii;
>>
>> for (isAscii = 0, p = (const signed char *) str; *p !=
>> '\0';
>> p++) isAscii &= (*p >= 0);
>>
>>
>> Did you mean for isAscii to be initialized to 1 (true) and
>> then be
>> cleared to 0 (false) when *p >= 0 is false?
>>
>>
>> Then length is (p - str)
>>
>>
>> How about something like this to hoist the second
>> comparison from
>> the loop:
>>
>> int len;
>> char asciiCheck;
>> for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>> asciiCheck |= *p;
>> }
>> len = (p - str);
>>
>> if (asciiCheck & 0x80) {
>> // ascii fast-path
>> return newSizedString8859_1(env, str, len);
>> }
>>
>> ...
>>
>> ---
>>
>> + jbyteArray hab = NULL;
>>
>> I'm having a hard time decoding the name "hab"
>>
>>
>> Not sure, but my guess is "heap allocated byteArray".
>>
>>
>> ---
>>
>> The code below is not UTF-8 specific. Can it be
>> refactored?
>>
>> + hab = (*env)->NewByteArray(env, len);
>> + if (hab != 0) {
>> + jclass strClazz = JNU_ClassString(env);
>> + CHECK_NULL_RETURN(strClazz, 0);
>> + (*env)->SetByteArrayRegion(env, hab, 0, len,
>> (jbyte
>> *)str);
>> + result = (*env)->NewObject(env, strClazz,
>> + String_init_ID, hab,
>> jnuEncoding);
>> + (*env)->DeleteLocalRef(env, hab);
>> + return result;
>> + }
>>
>>
>> Yes, probably. The excessive copy-paste here is due to a
>> crash
>> issue I ran into when
>> Charset.isSupported was called without proper
>> initialization. It
>> has since been
>> resolved, but seems I forgot to revisit this part.
>>
>> I also realized I haven't added a test for this method.
>> I'll look
>> into doing the
>> refactoring and adding some sanity testing tomorrow.
>>
>>
>> ---
>>
>> We probably want to use unicode escapes in out java
>> sources to
>> keep all source files strictly ASCII.
>>
>>
>> You mean in the test? Sure.
>>
>> Refreshed the jdk webrev:
>> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
>> more to
>> come.
>>
>> Thanks!
>>
>> /Claes
>>
>>
>>
>>
>
More information about the build-dev
mailing list