RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8
Claes Redestad
claes.redestad at oracle.com
Tue Jun 13 20:27:02 UTC 2017
On 2017-06-13 21:09, Martin Buchholz wrote:
> Looks good to me, although still hoping for more review from others.
I expect Sherman to weigh in. :-)
>
> + if (bytes != 0) {
> Style: use NULL.
Done.
>
> +static jstring newStringJava(JNIEnv *env, const char *str) {
>
> I expected two versions, one that took a length and one that did not,
> as you did with newString8859_1.
Ah, yes. Refactored and updated in-place.
> ---
> TIL that jbyte is signed char, not plain char, and so is more
> consistent with java "byte", hence less error prone, at least in theory.
>
> ---
> Can we test different native environments by using
> -Dsun.jnu.encoding=... ?
I did try, but jtreg currently doesn't appear to support specifying
flags for main/native @run:s (and helpfully suggests using main/othervm
instead...) - maybe someone knows a way around this?
/Claes
>
>
>
> On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad
> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>
> <adding build-dev; see
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html
> <http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>
> for missing context>
>
> Updated webrevs:
> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
> http://cr.openjdk.java.net/~redestad/8181147/top.04/
> <http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
>
> In this version there are various cleanups based on Martin's
> comments, improved test coverage and some small changes to
> ensure the updated test builds and runs on all our supported
> platforms.
>
> If someone with access to any of the platforms not supported
> by Oracle could take this for a spin and ensure linking the newly
> added test against libjava works as expected (or suggest how
> to make it work if it doesn't) then that'd be much appreciated.
>
> Thanks!
>
> /Claes
>
> On 06/13/2017 02:01 AM, Martin Buchholz wrote:
>
> I'm hoping Xueming will review as well.
> + char asciiCheck;
> + for (asciiCheck = 0, p = str; *p != '\0'; p++) {
> + asciiCheck |= *p;
> + }
> + len = (p - str);
> Probably conversion from ptrdiff_t to int needs a cast.
>
> Someday we might need to worry about input string longer than
> 2^31, but that's a pre-existing problem.
>
> The signed-ness of char being implementation-defined makes me
> nervous.
> Maybe do all the bit operations using explicit unsigned char,
> then the final test becomes simply
>
> asciiCheck <= 0x80
>
>
> On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>
> <mailto:claes.redestad at oracle.com
> <mailto:claes.redestad at oracle.com>>> wrote:
>
> Hi Martin,
>
> thanks for reviewing!
>
> On 2017-06-12 22:14, Martin Buchholz wrote:
>
> +/* Optimized for char set UTF-8 */
>
> "charset" is a (poor misnomer!) jargon term, not two
> words.
>
>
> I got that from the existing use of "char set" in this
> file, but
> will fix it in all places.
>
>
> ---
>
> + for (b = str[len]; b != '\0'; len++, b = str[len]) {
> + if (isAscii && b & 0x80) {
> + isAscii = JNI_FALSE;
> + }
> + }
>
> I would write this more like
>
> const signed char *p;
> int isAscii;
>
> for (isAscii = 0, p = (const signed char *) str; *p !=
> '\0';
> p++) isAscii &= (*p >= 0);
>
>
> Did you mean for isAscii to be initialized to 1 (true) and
> then be
> cleared to 0 (false) when *p >= 0 is false?
>
>
> Then length is (p - str)
>
>
> How about something like this to hoist the second
> comparison from
> the loop:
>
> int len;
> char asciiCheck;
> for (asciiCheck = 0, p = str; *p != '\0'; p++) {
> asciiCheck |= *p;
> }
> len = (p - str);
>
> if (asciiCheck & 0x80) {
> // ascii fast-path
> return newSizedString8859_1(env, str, len);
> }
>
> ...
>
> ---
>
> + jbyteArray hab = NULL;
>
> I'm having a hard time decoding the name "hab"
>
>
> Not sure, but my guess is "heap allocated byteArray".
>
>
> ---
>
> The code below is not UTF-8 specific. Can it be
> refactored?
>
> + hab = (*env)->NewByteArray(env, len);
> + if (hab != 0) {
> + jclass strClazz = JNU_ClassString(env);
> + CHECK_NULL_RETURN(strClazz, 0);
> + (*env)->SetByteArrayRegion(env, hab, 0, len,
> (jbyte
> *)str);
> + result = (*env)->NewObject(env, strClazz,
> + String_init_ID, hab,
> jnuEncoding);
> + (*env)->DeleteLocalRef(env, hab);
> + return result;
> + }
>
>
> Yes, probably. The excessive copy-paste here is due to a crash
> issue I ran into when
> Charset.isSupported was called without proper
> initialization. It
> has since been
> resolved, but seems I forgot to revisit this part.
>
> I also realized I haven't added a test for this method.
> I'll look
> into doing the
> refactoring and adding some sanity testing tomorrow.
>
>
> ---
>
> We probably want to use unicode escapes in out java
> sources to
> keep all source files strictly ASCII.
>
>
> You mean in the test? Sure.
>
> Refreshed the jdk webrev:
> http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
> <http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
> more to
> come.
>
> Thanks!
>
> /Claes
>
>
>
>
More information about the build-dev
mailing list