Inconsistencies in the return value (type) of string functions toLower() and toUpper().

Sasi Peri pvssasikala at gmail.com
Mon Jun 20 17:19:07 UTC 2022


   1. Like I already said “==” should be avoided in the first place, and
   the issue(s) described, 100 % are the bad programming behaviors. No debate
   there.
   2. And, my point is not to return a new object or literal (I a am not
   trying to drive this one specific behavior one way or another), there could
   be many more implementation ideas as well (to make sense of semantic use of
   operators) may be via “operator overloading” like other programming langs.
      -  And, yes, we could write wrappers of our own or make “linters” of
      our own and scan, auto-detect and/or auto-fix. We can get very creative.
   3. But the bigger point again is not about specific implementation or
   justifying bad programming behavior or ways to deal with it
   (lint/auto-detect/fix),

o   But to make it *“less error-prone*” (around semantics), improve it, so
there are fewer opportunities to make a mistake, in general.

o   That’s the inconsistencies, the non-deterministic nature of the output
(based on a VM-type/input-type/jdk-provider-type) would cause less
experienced developers (or coming from other langs to code in java), easily
make mistakes, that is not easily detectable (as it works in some
cases/unit-tests).

On Mon, Jun 20, 2022 at 1:22 AM - <liangchenblue at gmail.com> wrote:

> As an observer, I do not think this is a good idea, and there's just so
> many problems in your whole story...
>
> 1. The issues or bugs you described arise from bad programming behaviors.
> Every time you see ==, always remember that == only compares by identity,
> and do not try to use it to check equality of objects! Always use
> Objects.equals, or Arrays.equals if you are checking arrays. Java's core
> library has tons of objects that have equality distinct from having the
> same identity, yet they reuse the same identity when viable, such as in
> java.time packages. If you can miss ==, you probably can miss ! in reviews,
> and are you going to patch every boolean method to prevent those?
> 2. The specification of these methods do not require returning new String
> instances, so both implementations are proper. If one implementation
> accidentally has an unspecified "feature" that helps you, you can't rule
> out that that implementation may remove it at any time without prior
> notice. Will you ask IBM to restore it if OpenJ9 removes this "feature" you
> depend on?
> 3. If you really, really, really want new strings, just wrap every call to
> toLowerCase() or toUpperCase() in a new String(), or use a bytecode
> transformation library to patch every call to such factory methods in your
> test code. That is easily done in your code reviews, and the lack of such
> wraps is more easily spotted than the == misuse, right?
>
>
> On Sun, Jun 19, 2022 at 10:30 PM Sasi Peri <pvssasikala at gmail.com> wrote:
>
>> Hello,
>>
>> *Issue details*
>>
>> toLower() and toUpper() return a new String object sometimes and
>> sometimes a string literal, based on the input string type (and also
>> sometimes based on the VM/jdk type)
>>
>>
>>
>> For example
>>
>> -       HotSpot VM, If the input is a string literal, which *already*
>> has all “lower case letters”, toLower would return the same string literal,
>> if not it will convert all letters to lower and returns a new String()
>> object.
>>
>> -       However, openJ9 (e.g. IBM jdk8 ditsro, ) always returns a new
>> String object not a literal.
>>
>>
>>
>> This behavior is non deterministic, inconsistent, you cannot always
>> predict if the outcome is a new string object OR an interned string from
>> pool (particularly from unit testing stand point).
>>
>>
>>
>> *Sample code to show case above behavior*
>>
>> *package* com.bugs;
>>
>> *import* java.util.Locale;
>>
>>
>>
>> *public* *class* TestStringFunction
>>
>> {
>>
>>
>>
>> *  public* *static* *void* main(String args[])
>>
>>   {
>>
>>     String s1 = "abc";
>>
>>     String s2 = "ABC";
>>
>>
>>
>>     System.*out*.println("----- case: when string already lower
>> ----------");
>>
>> *    testIfEqualsLower*(s1);
>>
>>
>>
>>     System.*out*.println("----- case: when string with upper case
>> ----------");
>>
>> *    testIfEqualsLower*(s2);
>>
>>
>>
>>
>>
>>   }
>>
>>
>>
>> *  private* *static* *void* testIfEqualsLower(String s)
>>
>>   {
>>
>>
>>
>> *     if*(s.toLowerCase() == "abc")
>>
>>      {
>>
>>         System.*out*.println("YES - literal");
>>
>>      }
>>
>>
>>
>> *     if*(s.toLowerCase().equals("abc"))
>>
>>      {
>>
>>         System.*out*.println("YES – equals func");
>>
>>      }
>>
>>
>>
>>    }
>>
>> }
>>
>>
>>
>>
>>
>> *Out put*
>>
>>
>>
>> ----- case: when string already lower ----------
>>
>> YES - literal
>>
>> YES – equals func
>>
>>
>>
>> ----- case: when string with upper case ----------
>>
>> YES - equals func
>>
>>
>>
>>
>>
>> *Why this could be an issue or bug prone?*
>>
>> -       Suppose an unit test is written, for a method doAThing(), that
>> has toLower/Upper conversions in the middle of the code somewhere, and
>> apply logic based on that.
>> -       Though general guidance to compare unknown string (types) is
>> always using equals, sometimes developers can make a mistake (i.e. suppose
>> they used == in a unit test)
>> -       If the code review did not catch it, this behavior can cause all
>> unit tests passed, as long unit test is written with “small case string” as
>> input.
>> -       It could potentially make it to prod, and can be realized only
>> when it hits a case when input string has all uppercase OR mixed case
>> letters, which could be after multiple sprints, at the point not easily
>> detectable.
>>
>>
>>
>> *Suggestion*
>>
>> It would be great if we always can make a new String() and return always
>> a String object not an interned string sometimes, as openJ9/ibm does (with
>> some jdk versions).
>>
>> It may not be good idea, to always return “.interned” value and fill up
>> the intern pool, for these short-lived objects (as they are most times).
>>
>>
>>
>> If this is agreed/approved, I can make a change and commit.
>>
>>
>> Regards
>>
>> - SP
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20220620/10ca49be/attachment-0001.htm>


More information about the core-libs-dev mailing list