RFR(xs) JDK-8163861 JNI NewString() and GetStringLength() documentation incorrect
George Triantafillou
george.triantafillou at oracle.com
Thu Apr 20 14:25:18 UTC 2017
Hi David,
Thanks for the review. I wasn't aware that the CSR isn't yet available
for 10. Do you know when it will be available?
On 4/19/2017 8:48 PM, David Holmes wrote:
> Hi George,
>
> Apologies in advance - this is not "xs" in the discussion. But this
> needs to go through CSR process (which is not yet available for 10) so
> it can't be pushed yet anyway (sorry). But consider this my CSR
> review. :)
>
> On 20/04/2017 3:43 AM, George Triantafillou wrote:
>> Please review this very small fix to the spec for JDK-8163861:
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8163861
>>
>> webrev: http://cr.openjdk.java.net/~gtriantafill/8163861-webrev/webrev
>> <http://cr.openjdk.java.net/%7Egtriantafill/8163861-webrev/webrev>
>>
>> The change provides a more precise definition for NewString() and
>> GetStringLength(). Thanks.
>
> As the bug report states 'most of the Java documention was updated to
> deal with this', so I think it important to ensure any changes are
> consistent with that updated documentation and terminology.
> Specifically we need to refer to the String class API specification
> and the Character class specification which defines Unicode Character
> Representations for the Java platform.
>
> Looking at the JDK 8 JNI spec as referenced in the bug report we first
> see:
>
> jsize GetStringLength(JNIEnv *env, jstring string);
> Returns the length (the count of Unicode characters) of a Java string.
>
> This should be functionally equivalent to java.lang.String.length().
> If anything the "count of Unicode characters" should just be deleted
> as this method is intended to return the length of the Java string,
> just as String.length does. The phrase "count of Unicode characters"
> is at best "loose" as the reporter notes (and this terminology is
> mis-used in a few of the JNI string functions!). The String.length()
> method states:
>
> "The length is equal to the number of Unicode code units in the string."
>
> where "Unicode code units" is defined in the Character class.
>
> So the correct fix would be to change "characters" to "code units".
>
> However that there are two descriptions of how the function behaves -
> in javadoc terminology we have one description in the main body:
>
> 4168 <p>Returns the length (the count of Unicode
> 4169 characters) of a Java string.</p>
>
> and one equivalent to @returns:
>
> 4180 <h4>RETURNS:</h4>
> 4181
> 4182 <p>Returns the length of the Java string.</p>
>
> The latter is perfectly and accurately correct! If you want to know
> how the length of a Java String is defined then you need to see the
> String class.
>
> So I would say that the correct fix here is to either delete "(the
> count of Unicode characters)", or else change "characters" to "code
> units" as previously discussed.
That sounds reasonable!
>
> ---
>
> Turning to NewString .... the JNI spec states:
>
> "Constructs a new java.lang.String object from an array of Unicode
> characters."
>
> and the @param equivalent for 'len' states:
>
> "len: length of the Unicode string. May be 0."
>
> So initially we have the problematic use of "Unicode characters"
> again; then we have the problem as to what the definition of the
> length of a "Unicode string" is.
>
> I find no help elsewhere in the JNI spec to clarify this (despite a
> very lengthy discussion on its use of UTF-8 encodings). I can only
> assume, based on the implementation, that the expectation is that the
> array of "Unicode characters" is in UTF-16 format. In which case a
> more technically accurate fix would be to say:
>
> "len: length of the Unicode string in UTF-16 code units. May be 0."
>
> Thanks,
> David
>
> PS. I will also add the above to the bug report.
Thanks, I appreciate it.
-George
>
>> -George
More information about the hotspot-dev
mailing list