RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8
Martin Buchholz
martinrb at google.com
Tue Jun 13 19:09:12 UTC 2017
Looks good to me, although still hoping for more review from others.
+ if (bytes != 0) {
Style: use NULL.
+static jstring newStringJava(JNIEnv *env, const char *str) {
I expected two versions, one that took a length and one that did not, as
you did with newString8859_1.
---
TIL that jbyte is signed char, not plain char, and so is more consistent
with java "byte", hence less error prone, at least in theory.
---
Can we test different native environments by using -Dsun.jnu.encoding=... ?
On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad <claes.redestad at oracle.com>
wrote:
> <adding build-dev; see http://mail.openjdk.java.net/p
> ipermail/core-libs-dev/2017-June/048233.html for missing context>
>
> Updated webrevs:
> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
> http://cr.openjdk.java.net/~redestad/8181147/top.04/
>
> In this version there are various cleanups based on Martin's
> comments, improved test coverage and some small changes to
> ensure the updated test builds and runs on all our supported
> platforms.
>
> If someone with access to any of the platforms not supported
> by Oracle could take this for a spin and ensure linking the newly
> added test against libjava works as expected (or suggest how
> to make it work if it doesn't) then that'd be much appreciated.
>
> Thanks!
>
> /Claes
>
> On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>
>> I'm hoping Xueming will review as well.
>> + char asciiCheck;
>> + for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>> + asciiCheck |= *p;
>> + }
>> + len = (p - str);
>> Probably conversion from ptrdiff_t to int needs a cast.
>>
>> Someday we might need to worry about input string longer than 2^31, but
>> that's a pre-existing problem.
>>
>> The signed-ness of char being implementation-defined makes me nervous.
>> Maybe do all the bit operations using explicit unsigned char, then the
>> final test becomes simply
>>
>> asciiCheck <= 0x80
>>
>>
>> On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad <
>> claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>
>> Hi Martin,
>>
>> thanks for reviewing!
>>
>> On 2017-06-12 22:14, Martin Buchholz wrote:
>>
>> +/* Optimized for char set UTF-8 */
>>
>> "charset" is a (poor misnomer!) jargon term, not two words.
>>
>>
>> I got that from the existing use of "char set" in this file, but
>> will fix it in all places.
>>
>>
>> ---
>>
>> + for (b = str[len]; b != '\0'; len++, b = str[len]) {
>> + if (isAscii && b & 0x80) {
>> + isAscii = JNI_FALSE;
>> + }
>> + }
>>
>> I would write this more like
>>
>> const signed char *p;
>> int isAscii;
>>
>> for (isAscii = 0, p = (const signed char *) str; *p != '\0';
>> p++) isAscii &= (*p >= 0);
>>
>>
>> Did you mean for isAscii to be initialized to 1 (true) and then be
>> cleared to 0 (false) when *p >= 0 is false?
>>
>>
>> Then length is (p - str)
>>
>>
>> How about something like this to hoist the second comparison from
>> the loop:
>>
>> int len;
>> char asciiCheck;
>> for (asciiCheck = 0, p = str; *p != '\0'; p++) {
>> asciiCheck |= *p;
>> }
>> len = (p - str);
>>
>> if (asciiCheck & 0x80) {
>> // ascii fast-path
>> return newSizedString8859_1(env, str, len);
>> }
>>
>> ...
>>
>> ---
>>
>> + jbyteArray hab = NULL;
>>
>> I'm having a hard time decoding the name "hab"
>>
>>
>> Not sure, but my guess is "heap allocated byteArray".
>>
>>
>> ---
>>
>> The code below is not UTF-8 specific. Can it be refactored?
>>
>> + hab = (*env)->NewByteArray(env, len);
>> + if (hab != 0) {
>> + jclass strClazz = JNU_ClassString(env);
>> + CHECK_NULL_RETURN(strClazz, 0);
>> + (*env)->SetByteArrayRegion(env, hab, 0, len, (jbyte
>> *)str);
>> + result = (*env)->NewObject(env, strClazz,
>> + String_init_ID, hab,
>> jnuEncoding);
>> + (*env)->DeleteLocalRef(env, hab);
>> + return result;
>> + }
>>
>>
>> Yes, probably. The excessive copy-paste here is due to a crash
>> issue I ran into when
>> Charset.isSupported was called without proper initialization. It
>> has since been
>> resolved, but seems I forgot to revisit this part.
>>
>> I also realized I haven't added a test for this method. I'll look
>> into doing the
>> refactoring and adding some sanity testing tomorrow.
>>
>>
>> ---
>>
>> We probably want to use unicode escapes in out java sources to
>> keep all source files strictly ASCII.
>>
>>
>> You mean in the test? Sure.
>>
>> Refreshed the jdk webrev:
>> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
>> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/> - more to
>> come.
>>
>> Thanks!
>>
>> /Claes
>>
>>
>>
>
More information about the build-dev
mailing list