RFR(xs) JDK-8163861 JNI NewString() and GetStringLength() documentation incorrect

Thu Apr 20 00:48:26 UTC 2017

Hi George,

Apologies in advance - this is not "xs" in the discussion. But this 
needs to go through CSR process (which is not yet available for 10) so 
it can't be pushed yet anyway (sorry). But consider this my CSR review. :)

On 20/04/2017 3:43 AM, George Triantafillou wrote:
> Please review this very small fix to the spec for JDK-8163861:
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8163861
>
> webrev: http://cr.openjdk.java.net/~gtriantafill/8163861-webrev/webrev
> <http://cr.openjdk.java.net/%7Egtriantafill/8163861-webrev/webrev>
>
> The change provides a more precise definition for NewString() and
> GetStringLength().  Thanks.

As the bug report states 'most of the Java documention was updated to 
deal with this', so I think it important to ensure any changes are 
consistent with that updated documentation and terminology. Specifically 
we need to refer to the String class API specification and the Character 
class specification which defines Unicode Character Representations for 
the Java platform.

Looking at the JDK 8 JNI spec as referenced in the bug report we first see:

jsize GetStringLength(JNIEnv *env, jstring string);
Returns the length (the count of Unicode characters) of a Java string.

This should be functionally equivalent to java.lang.String.length(). If 
anything the "count of Unicode characters" should just be deleted as 
this method is intended to return the length of the Java string, just as 
String.length does. The phrase "count of Unicode characters" is at best 
"loose" as the reporter notes (and this terminology is mis-used in a few 
of the JNI string functions!). The String.length() method states:

"The length is equal to the number of Unicode code units in the string."

where "Unicode code units" is defined in the Character class.

So the correct fix would be to change "characters" to "code units".

However that there are two descriptions of how the function behaves - in 
javadoc terminology we have one description in the main body:

4168 <p>Returns the length (the count of Unicode
4169 characters) of a Java string.</p>

and one equivalent to @returns:

4180 <h4>RETURNS:</h4>
4181
4182 <p>Returns the length of the Java string.</p>

The latter is perfectly and accurately correct! If you want to know how 
the length of a Java String is defined then you need to see the String 
class.

So I would say that the correct fix here is to either delete "(the count 
of Unicode characters)", or else change "characters" to "code units" as 
previously discussed.

---

Turning to NewString .... the JNI spec states:

"Constructs a new java.lang.String object from an array of Unicode 
characters."

and the @param equivalent for 'len' states:

"len: length of the Unicode string. May be 0."

So initially we have the problematic use of "Unicode characters" again; 
then we have the problem as to what the definition of the length of a 
"Unicode string" is.

I find no help elsewhere in the JNI spec to clarify this (despite a very 
lengthy discussion on its use of UTF-8 encodings). I can only assume, 
based on the implementation, that the expectation is that the array of 
"Unicode characters" is in UTF-16 format. In which case a more 
technically accurate fix would be to say:

"len: length of the Unicode string in UTF-16 code units. May be 0."

Thanks,
David

PS. I will also add the above to the bug report.

> -George