Inconsistencies in the return value (type) of string functions toLower() and toUpper().

- liangchenblue at gmail.com
Mon Jun 20 05:22:37 UTC 2022


As an observer, I do not think this is a good idea, and there's just so
many problems in your whole story...

1. The issues or bugs you described arise from bad programming behaviors.
Every time you see ==, always remember that == only compares by identity,
and do not try to use it to check equality of objects! Always use
Objects.equals, or Arrays.equals if you are checking arrays. Java's core
library has tons of objects that have equality distinct from having the
same identity, yet they reuse the same identity when viable, such as in
java.time packages. If you can miss ==, you probably can miss ! in reviews,
and are you going to patch every boolean method to prevent those?
2. The specification of these methods do not require returning new String
instances, so both implementations are proper. If one implementation
accidentally has an unspecified "feature" that helps you, you can't rule
out that that implementation may remove it at any time without prior
notice. Will you ask IBM to restore it if OpenJ9 removes this "feature" you
depend on?
3. If you really, really, really want new strings, just wrap every call to
toLowerCase() or toUpperCase() in a new String(), or use a bytecode
transformation library to patch every call to such factory methods in your
test code. That is easily done in your code reviews, and the lack of such
wraps is more easily spotted than the == misuse, right?


On Sun, Jun 19, 2022 at 10:30 PM Sasi Peri <pvssasikala at gmail.com> wrote:

> Hello,
>
> *Issue details*
>
> toLower() and toUpper() return a new String object sometimes and sometimes
> a string literal, based on the input string type (and also sometimes based
> on the VM/jdk type)
>
>
>
> For example
>
> -       HotSpot VM, If the input is a string literal, which *already* has
> all “lower case letters”, toLower would return the same string literal, if
> not it will convert all letters to lower and returns a new String() object.
>
> -       However, openJ9 (e.g. IBM jdk8 ditsro, ) always returns a new
> String object not a literal.
>
>
>
> This behavior is non deterministic, inconsistent, you cannot always
> predict if the outcome is a new string object OR an interned string from
> pool (particularly from unit testing stand point).
>
>
>
> *Sample code to show case above behavior*
>
> *package* com.bugs;
>
> *import* java.util.Locale;
>
>
>
> *public* *class* TestStringFunction
>
> {
>
>
>
> *  public* *static* *void* main(String args[])
>
>   {
>
>     String s1 = "abc";
>
>     String s2 = "ABC";
>
>
>
>     System.*out*.println("----- case: when string already lower
> ----------");
>
> *    testIfEqualsLower*(s1);
>
>
>
>     System.*out*.println("----- case: when string with upper case
> ----------");
>
> *    testIfEqualsLower*(s2);
>
>
>
>
>
>   }
>
>
>
> *  private* *static* *void* testIfEqualsLower(String s)
>
>   {
>
>
>
> *     if*(s.toLowerCase() == "abc")
>
>      {
>
>         System.*out*.println("YES - literal");
>
>      }
>
>
>
> *     if*(s.toLowerCase().equals("abc"))
>
>      {
>
>         System.*out*.println("YES – equals func");
>
>      }
>
>
>
>    }
>
> }
>
>
>
>
>
> *Out put*
>
>
>
> ----- case: when string already lower ----------
>
> YES - literal
>
> YES – equals func
>
>
>
> ----- case: when string with upper case ----------
>
> YES - equals func
>
>
>
>
>
> *Why this could be an issue or bug prone?*
>
> -       Suppose an unit test is written, for a method doAThing(), that
> has toLower/Upper conversions in the middle of the code somewhere, and
> apply logic based on that.
> -       Though general guidance to compare unknown string (types) is
> always using equals, sometimes developers can make a mistake (i.e. suppose
> they used == in a unit test)
> -       If the code review did not catch it, this behavior can cause all
> unit tests passed, as long unit test is written with “small case string” as
> input.
> -       It could potentially make it to prod, and can be realized only
> when it hits a case when input string has all uppercase OR mixed case
> letters, which could be after multiple sprints, at the point not easily
> detectable.
>
>
>
> *Suggestion*
>
> It would be great if we always can make a new String() and return always a
> String object not an interned string sometimes, as openJ9/ibm does (with
> some jdk versions).
>
> It may not be good idea, to always return “.interned” value and fill up
> the intern pool, for these short-lived objects (as they are most times).
>
>
>
> If this is agreed/approved, I can make a change and commit.
>
>
> Regards
>
> - SP
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20220620/ef26da6a/attachment-0001.htm>


More information about the core-libs-dev mailing list