RFR: 8338257: UTF8 lengths should be size_t not int [v2]

David Holmes dholmes at openjdk.org
Mon Aug 19 23:19:04 UTC 2024


On Thu, 15 Aug 2024 20:43:51 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Did you try to compile this with -Wconversion (not -Wsign-conversion) without -Werror?

I hadn't but will do so ...

> src/hotspot/share/classfile/javaClasses.cpp line 586:
> 
>> 584:     ResourceMark rm;
>> 585:     jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0);
>> 586:     size_t utf8_len = length;
> 
> These are sign conversions.  This is too big to change here and not sure what the fan out would be but java_lang_String::length() should return unsigned and also arrayOop.hpp length.  They're never negative.  We should probably assert that (below).

Sorry I don't follow what you are asking for here. A Java String length is an int even though it can't be negative in length - just as Java array indices are int even though they can't be negative either.

Anyway this conversion should have a cast, though none of the usual complaining compilers complained about it.

> src/hotspot/share/classfile/javaClasses.cpp line 639:
> 
>> 637:   if (length == 0) {
>> 638:     return 0;
>> 639:   }
> 
> Maybe assert length > 0 here?

Why "> 0" ?

> src/hotspot/share/classfile/javaClasses.cpp line 702:
> 
>> 700:   } else {
>> 701:     jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0);
>> 702:     return UNICODE::as_utf8(position, static_cast<int>(length), buf, buflen);
> 
> Don't you want checked_cast here not static casts.

These lengths are the lengths of Java array so an int >= 0.

> src/hotspot/share/prims/jni.cpp line 2226:
> 
>> 2224:  HOTSPOT_JNI_GETSTRINGUTFLENGTH_ENTRY(env, string);
>> 2225:   oop java_string = JNIHandles::resolve_non_null(string);
>> 2226:   jsize ret = java_lang_String::utf8_length_as_int(java_string);
> 
> So the spec says that this should be jsize (signed int), which is why this is, right?

Yes. Hence the other change to add a new JNI API.

> src/hotspot/share/utilities/utf8.cpp line 512:
> 
>> 510: 
>> 511: template<typename T>
>> 512: int UNICODE::utf8_length_as_int(const T* base, int length) {
> 
> Why was this parameter left as int and not size_t ?

It is the length of a Java array - so int >= 0

> src/hotspot/share/utilities/utf8.hpp line 47:
> 
>> 45: 
>> 46: In the code below if we have latin-1 content then we treat the String's data
>> 47: array as a jbyte[], else a jchar[]. The lengths of these arrays are specified
> 
> jchar is 16-bits right?  So the max length if not latin-1 is INT_MAX/2 ?

The maximum number of jchar characters is INT_MAX/2

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2297676721
PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722493414
PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722494136
PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722494680
PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722495327
PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722495875
PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722496709


More information about the serviceability-dev mailing list