RFR(xs) JDK-8163861 JNI NewString() and GetStringLength() documentation incorrect
David Holmes
david.holmes at oracle.com
Thu Apr 20 00:48:26 UTC 2017
Hi George,
Apologies in advance - this is not "xs" in the discussion. But this
needs to go through CSR process (which is not yet available for 10) so
it can't be pushed yet anyway (sorry). But consider this my CSR review. :)
On 20/04/2017 3:43 AM, George Triantafillou wrote:
> Please review this very small fix to the spec for JDK-8163861:
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8163861
>
> webrev: http://cr.openjdk.java.net/~gtriantafill/8163861-webrev/webrev
> <http://cr.openjdk.java.net/%7Egtriantafill/8163861-webrev/webrev>
>
> The change provides a more precise definition for NewString() and
> GetStringLength(). Thanks.
As the bug report states 'most of the Java documention was updated to
deal with this', so I think it important to ensure any changes are
consistent with that updated documentation and terminology. Specifically
we need to refer to the String class API specification and the Character
class specification which defines Unicode Character Representations for
the Java platform.
Looking at the JDK 8 JNI spec as referenced in the bug report we first see:
jsize GetStringLength(JNIEnv *env, jstring string);
Returns the length (the count of Unicode characters) of a Java string.
This should be functionally equivalent to java.lang.String.length(). If
anything the "count of Unicode characters" should just be deleted as
this method is intended to return the length of the Java string, just as
String.length does. The phrase "count of Unicode characters" is at best
"loose" as the reporter notes (and this terminology is mis-used in a few
of the JNI string functions!). The String.length() method states:
"The length is equal to the number of Unicode code units in the string."
where "Unicode code units" is defined in the Character class.
So the correct fix would be to change "characters" to "code units".
However that there are two descriptions of how the function behaves - in
javadoc terminology we have one description in the main body:
4168 <p>Returns the length (the count of Unicode
4169 characters) of a Java string.</p>
and one equivalent to @returns:
4180 <h4>RETURNS:</h4>
4181
4182 <p>Returns the length of the Java string.</p>
The latter is perfectly and accurately correct! If you want to know how
the length of a Java String is defined then you need to see the String
class.
So I would say that the correct fix here is to either delete "(the count
of Unicode characters)", or else change "characters" to "code units" as
previously discussed.
---
Turning to NewString .... the JNI spec states:
"Constructs a new java.lang.String object from an array of Unicode
characters."
and the @param equivalent for 'len' states:
"len: length of the Unicode string. May be 0."
So initially we have the problematic use of "Unicode characters" again;
then we have the problem as to what the definition of the length of a
"Unicode string" is.
I find no help elsewhere in the JNI spec to clarify this (despite a very
lengthy discussion on its use of UTF-8 encodings). I can only assume,
based on the implementation, that the expectation is that the array of
"Unicode characters" is in UTF-16 format. In which case a more
technically accurate fix would be to say:
"len: length of the Unicode string in UTF-16 code units. May be 0."
Thanks,
David
PS. I will also add the above to the bug report.
> -George
More information about the hotspot-dev
mailing list