Inconsistencies in the return value (type) of string functions toLower() and toUpper().

Bernd Eckenfels ecki at zusammenkunft.net
Mon Jun 20 03:57:13 UTC 2022


Since Sting instances are immutable Idendity should not matter and It is wrong in most cases to rely on string Idendity (==) and the APIs are intentionally not making any assertions about Idendity and/or interning of strings (also the most efficient implementation would not return a new instance if not needed, this also applies to replace or escape or similar functions - just don’t rely on it).

Gruss
Bernd
--
https://urldefense.com/v3/__http://bernd.eckenfels.net__;!!ACWV5N9M2RV99hQ!OCDLUrMtQ8iDuN5ZBlmPgMeJIq1GGLAOuq8CauKNxaqHh-RiIaLCGL-ScrHHeUpLQyeWzm4wj0SMmsGK9Lhbcm5T1Q$ 
________________________________
Von: core-libs-dev <core-libs-dev-retn at openjdk.org> im Auftrag von Sasi Peri <pvssasikala at gmail.com>
Gesendet: Monday, June 20, 2022 5:45:08 AM
An: core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
Betreff: Re: Inconsistencies in the return value (type) of string functions toLower() and toUpper().


  *   Attaching the sample code file (same code copied inline in the above issue)
  *   Tested with hotspot vm JDK 17 temu and adopt JDK 8, for the above non-deterministic behavior.
  *   Also tested with opnJ9 (IBM, jdk8), which returned always a new String, so == comparison will always fail and only .equals will work.

On Sun, Jun 19, 2022 at 11:29 PM Sasi Peri <pvssasikala at gmail.com<mailto:pvssasikala at gmail.com>> wrote:
Hello,


Issue details

toLower() and toUpper() return a new String object sometimes and sometimes a string literal, based on the input string type (and also sometimes based on the VM/jdk type)



For example

-       HotSpot VM, If the input is a string literal, which *already* has all “lower case letters”, toLower would return the same string literal, if not it will convert all letters to lower and returns a new String() object.

-       However, openJ9 (e.g. IBM jdk8 ditsro, ) always returns a new String object not a literal.



This behavior is non deterministic, inconsistent, you cannot always predict if the outcome is a new string object OR an interned string from pool (particularly from unit testing stand point).



Sample code to show case above behavior

package com.bugs;

import java.util.Locale;



public class TestStringFunction

{



  public static void main(String args[])

  {

    String s1 = "abc";

    String s2 = "ABC";



    System.out.println("----- case: when string already lower ----------");

    testIfEqualsLower(s1);



    System.out.println("----- case: when string with upper case ----------");

    testIfEqualsLower(s2);





  }



  private static void testIfEqualsLower(String s)

  {



     if(s.toLowerCase() == "abc")

     {

        System.out.println("YES - literal");

     }



     if(s.toLowerCase().equals("abc"))

     {

        System.out.println("YES – equals func");

     }



   }

}





Out put



----- case: when string already lower ----------

YES - literal

YES – equals func



----- case: when string with upper case ----------

YES - equals func





Why this could be an issue or bug prone?

-       Suppose an unit test is written, for a method doAThing(), that has toLower/Upper conversions in the middle of the code somewhere, and apply logic based on that.
-       Though general guidance to compare unknown string (types) is always using equals, sometimes developers can make a mistake (i.e. suppose they used == in a unit test)
-       If the code review did not catch it, this behavior can cause all unit tests passed, as long unit test is written with “small case string” as input.
-       It could potentially make it to prod, and can be realized only when it hits a case when input string has all uppercase OR mixed case letters, which could be after multiple sprints, at the point not easily detectable.



Suggestion

It would be great if we always can make a new String() and return always a String object not an interned string sometimes, as openJ9/ibm does (with some jdk versions).

It may not be good idea, to always return “.interned” value and fill up the intern pool, for these short-lived objects (as they are most times).



If this is agreed/approved, I can make a change and commit.


Regards

- SP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20220620/adeb5d5f/attachment-0001.htm>


More information about the core-libs-dev mailing list