RFR(xs) JDK-8163861 JNI NewString() and GetStringLength() documentation incorrect

Thu Apr 20 14:25:18 UTC 2017

Hi David,

Thanks for the review.  I wasn't aware that the CSR isn't yet available 
for 10.  Do you know when it will be available?

On 4/19/2017 8:48 PM, David Holmes wrote:
> Hi George,
>
> Apologies in advance - this is not "xs" in the discussion. But this 
> needs to go through CSR process (which is not yet available for 10) so 
> it can't be pushed yet anyway (sorry). But consider this my CSR 
> review. :)
>
> On 20/04/2017 3:43 AM, George Triantafillou wrote:
>> Please review this very small fix to the spec for JDK-8163861:
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8163861
>>
>> webrev: http://cr.openjdk.java.net/~gtriantafill/8163861-webrev/webrev
>> <http://cr.openjdk.java.net/%7Egtriantafill/8163861-webrev/webrev>
>>
>> The change provides a more precise definition for NewString() and
>> GetStringLength().  Thanks.
>
> As the bug report states 'most of the Java documention was updated to 
> deal with this', so I think it important to ensure any changes are 
> consistent with that updated documentation and terminology. 
> Specifically we need to refer to the String class API specification 
> and the Character class specification which defines Unicode Character 
> Representations for the Java platform.
>
> Looking at the JDK 8 JNI spec as referenced in the bug report we first 
> see:
>
> jsize GetStringLength(JNIEnv *env, jstring string);
> Returns the length (the count of Unicode characters) of a Java string.
>
> This should be functionally equivalent to java.lang.String.length(). 
> If anything the "count of Unicode characters" should just be deleted 
> as this method is intended to return the length of the Java string, 
> just as String.length does. The phrase "count of Unicode characters" 
> is at best "loose" as the reporter notes (and this terminology is 
> mis-used in a few of the JNI string functions!). The String.length() 
> method states:
>
> "The length is equal to the number of Unicode code units in the string."
>
> where "Unicode code units" is defined in the Character class.
>
> So the correct fix would be to change "characters" to "code units".
>
> However that there are two descriptions of how the function behaves - 
> in javadoc terminology we have one description in the main body:
>
> 4168 <p>Returns the length (the count of Unicode
> 4169 characters) of a Java string.</p>
>
> and one equivalent to @returns:
>
> 4180 <h4>RETURNS:</h4>
> 4181
> 4182 <p>Returns the length of the Java string.</p>
>
> The latter is perfectly and accurately correct! If you want to know 
> how the length of a Java String is defined then you need to see the 
> String class.
>
> So I would say that the correct fix here is to either delete "(the 
> count of Unicode characters)", or else change "characters" to "code 
> units" as previously discussed.
That sounds reasonable!

>
> ---
>
> Turning to NewString .... the JNI spec states:
>
> "Constructs a new java.lang.String object from an array of Unicode 
> characters."
>
> and the @param equivalent for 'len' states:
>
> "len: length of the Unicode string. May be 0."
>
> So initially we have the problematic use of "Unicode characters" 
> again; then we have the problem as to what the definition of the 
> length of a "Unicode string" is.
>
> I find no help elsewhere in the JNI spec to clarify this (despite a 
> very lengthy discussion on its use of UTF-8 encodings). I can only 
> assume, based on the implementation, that the expectation is that the 
> array of "Unicode characters" is in UTF-16 format. In which case a 
> more technically accurate fix would be to say:
>
> "len: length of the Unicode string in UTF-16 code units. May be 0."
>
> Thanks,
> David
>
> PS. I will also add the above to the bug report.
Thanks, I appreciate it.

-George
>
>> -George