Inconsistencies in the return value (type) of string functions toLower() and toUpper().

Brian Goetz brian.goetz at oracle.com
Mon Jun 20 18:24:39 UTC 2022


What you are suggesting is what we call "specifying the implementation", 
which is generally not a very good idea.

The specification for these methods says "returns a String <with the 
following properties>".  That's a good specification!  It says what you 
can expect from the result.  It says nothing about the identity of the 
returned object, whether it is interned, whether it is aliased, etc.  
This is by design. Specifying the implementation not only robs future 
implementers of the ability to improve the implementation (which in turn 
improves all the code that uses it, without having to even recompile), 
and encourages people to do questionable things like make assumptions 
about the identity of the returned String.

To describe this as "non-deterministic" is an abuse of the term.  What 
is going on here is that the specification makes certain promises, and 
multiple implementations choose different ways to get there (and any 
given implementation could choose different ways in different points in 
time.)  Similarly, describing it as "inconsistent" is also an abuse; no 
one promised a non-aliased object.

If your code wants to make assumptions about interning or lack of 
aliasing, you should call "new String(s)".

On 6/19/2022 11:29 PM, Sasi Peri wrote:
> Hello,
>
> *Issue details*
>
> toLower() and toUpper() return a new String object sometimes and 
> sometimes a string literal, based on the input string type (and also 
> sometimes based on the VM/jdk type)
>
> For example
>
> -HotSpot VM, If the input is a string literal, which *already* has all 
> “lower case letters”, toLower would return the same string literal, if 
> not it will convert all letters to lower and returns a new String() 
> object.
>
> -However, openJ9 (e.g. IBM jdk8 ditsro, ) always returns a new String 
> object not a literal.
>
> This behavior is non deterministic, inconsistent, you cannot always 
> predict if the outcome is a new string object OR an interned string 
> from pool (particularly from unit testing stand point).
>
> *Sample code to show case above behavior*
>
> *package*com.bugs;
>
> *import*java.util.Locale;
>
> *public**class*TestStringFunction
>
> {
>
> *  public**static**void*main(String args[])
>
>   {
>
>     String s1= "abc";
>
>     String s2= "ABC";
>
>     System.*/out/*.println("----- case: when string already lower 
> ----------");
>
> /testIfEqualsLower/(s1);
>
>     System.*/out/*.println("----- case: when string with upper case 
> ----------");
>
> /testIfEqualsLower/(s2);
>
>   }
>
> *  private**static**void*testIfEqualsLower(String s)
>
>   {
>
> *     if*(s.toLowerCase() == "abc")
>
>      {
>
>         System.*/out/*.println("YES - literal");
>
>      }
>
> *     if*(s.toLowerCase().equals("abc"))
>
>      {
>
>         System.*/out/*.println("YES – equals func");
>
>      }
>
>    }
>
> }
>
> *_Out put_*
>
> *__*
>
> ----- case: when string already lower ----------
>
> YES - literal
>
> YES – equals func
>
> ----- case: when string with upper case ----------
>
> YES - equals func**
>
> **
>
> *Why this could be an issue or bug prone?*
>
> -Suppose an unit test is written, for a method doAThing(), that has 
> toLower/Upper conversions in the middle of the code somewhere, and 
> apply logic based on that.
> -Though general guidance to compare unknown string (types) is always 
> using equals, sometimes developers can make a mistake (i.e. suppose 
> they used == in a unit test)
> -If the code review did not catch it, this behavior can cause all unit 
> tests passed, as long unit test is written with “small case string” as 
> input.
> -It could potentially make it to prod, and can be realized only when 
> it hits a case when inputstring has all uppercase OR mixed case 
> letters, which could be after multiple sprints, at the point not 
> easily detectable.
>
> *Suggestion*
>
> It would be great if we always can make a new String() and return 
> always a String object not an interned string sometimes, as openJ9/ibm 
> does (with some jdk versions).
>
> It may not be good idea, to always return “.interned” value and fill 
> up the intern pool, for these short-lived objects (as they are most 
> times).
>
> If this is agreed/approved, I can make a change and commit.
>
>
> Regards
>
> - SP
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20220620/60daec0d/attachment-0001.htm>


More information about the core-libs-dev mailing list