Is there a possibility of the string equality operator (==) being fixed?

Brian Goetz brian.goetz at oracle.com
Mon Oct 23 14:37:11 UTC 2023


One of the pleasant side-effects of Project Valhalla is that (a) we'll 
be able to say .equals() on primitives, and (b) the cost of doing so 
will JIT down to that of ==.  Which means that we can tell people "just 
use .equals() everywhere" (except when implementing low-level code like 
IdentityHashMap) and they will never have to wonder which to use.  I 
realize this doesn't solve the "wrong op got the good name" problem, but 
it gives us a path to not having to think about it so often.

It may be possible to migrate String to a value class at some point in 
the distant future (though this has a considerable shroud of uncertainty 
surrounding it), which makes this problem recede farther into the 
background for the particular case of String.  Which might be enough to 
make this significantly less of a problem, for the reasons you outline 
here.



On 10/23/2023 5:32 AM, Andrew Dinn wrote:
> Hi Brian,
>
> I think there is also another subtle confusion lying behind this 
> question. Strings are in the unusual position that they can be named 
> via program literal text e.g. The 7 character sequence "hello" in a 
> program body is a literal reference to an instance of java.lang.String.
>
> That is not the case for any other class of object bar one, the 
> exception being instances of java.lang.Class e.g. the 22 character 
> sequence java.lang.String.class in a program body serves as a literal 
> reference to an instance of java.lang.Class.
>
> It is easy for a novice programmer to draw the conclusion that this 
> literal reference must exist 1-1 with regard to its corresponding 
> literal i.e. that there will only ever be one String whose ordered 
> sequence of characters will be 'h', 'e', ''l', 'l' and 'o'. The fact that
>
>   new String("hello") == "hello"
>
> will evaluate to false is not immediately evident to beginner 
> programmers.
>
> This misnomer is helped along by the fact that the JVM ensures that 
> all occurrences of a String literal in disparate class files do end up 
> referring to the same String instance. If method m of class C passes 
> the literal String "hello" to method m2 of class C2 and the latter 
> compares its input to the literal String "hello" using an equality 
> comparison then the result will be true.
>
> class C
> {
>     . . .
>     void m() {
>         C2.m2("foo");
>     }
> }
>
> class C2
> {
>     . . .
>     static void m2(String s) {
>         if (s == "foo") {
>             System.out.println("identity equal");
>         }
>     }
> }
>
> The message printout will always be triggered. i.e. Strings mentioned 
> as program literals in source code and thereby introduced as String 
> constants in  bytecode are *deduplicated* to the same String instance 
> when the bytecode is loaded by the JVM.
>
> That is why it takes some work to arrive at a case like the first code 
> snippet above where two Strings can have equal state but not equal 
> identity. At least one of the String instances has to be explicitly 
> created at runtime via new, substring() or some other method that 
> synthesises a String object.
>
> It is interesting to compare this situation with that for class 
> literals where deduplication is either not required or would be 
> incorrect. It requires a much greater feat of ingenuity (equally, 
> carelessness or recklessness), involving the use of 
> application-defined class loaders, to arrive at a situation where the 
> program literal org.my.Foo.class occurring in a method m of class C 
> can identify a different instance of java.lang.Class to the same 
> program literal occurring in a method m2 of class C2. Yet it is possible:
>
> class C
> {
>     . . .
>     void m() {
>         C2.m2(org.my.Foo.class);
>     }
> }
>
> class C2
> {
>     . . .
>     static void m2(Class<?> c) {
>         if (s != org.my.Foo.class) {
>             // Yes, if you misbehave you can end up here!
>             System.out.println("you are in classloader hell!");
>         }
>     }
> }
>
> I guess I could offer instructions as to how to arrive at the 
> situation where m2 prints out its warning message but I'll leave that 
> as an exercise for the expert (or unwary) reader.
>
> regards,
>
>
> Andrew Dinn
> -----------
>
> On 22/10/2023 22:29, Brian Goetz wrote:
>> First of all, the question is framed in a way that assumes its own 
>> conclusion; that somehow there is something "broken" to be "fixed". 
>> The == operator on object references asks a simple, well-defined, 
>> fundamental question: do these two object references _refer to the 
>> same object_.  There is a similar, related question of "do these two 
>> objects _encode the same domain value_" (which is inherently 
>> class-specific), and that goes by the name of the "equals" method. 
>> These are two different questions, and it is important to be able to 
>> ask each.  One does not replace the other.
>>
>> The presumption that something is "broken" comes from the subjective 
>> perception that the "less important" operation got the "better" 
>> name.  Indeed, without a clear understanding of what these two 
>> questions are, it is easy to make mistakes.  The comparison to C# 
>> illustrates that other languages could make other choices, which 
>> might result in a different category of mistakes that users might or 
>> might not make.
>>
>> While the answer you got said "backward compatibility", this is a 
>> too-simplistic (though often repeated) answer; the answer really is 
>> "because this exactly is how the language was designed to work", 
>> which means this is not something to be "fixed".  If we agreed that 
>> this original intention was wrong-headed, then the issue of 
>> compatibility would come in -- that there are billions of lines of 
>> code that have been written in Java, and turning Java into Java++, 
>> whether "better" or not, would break many of them.  (Sometimes 
>> language do make incompatible changes because something is so 
>> egregiously broken that it is better to break half the world's code 
>> than continue living with it, but the bar for this is extremely high, 
>> and "I wish the other operation got the good name" doesn't come near 
>> it.)
>>
>> But the eye-rolling of "how much are we going to sacrifice at the 
>> altar of backward compatibility" is misplaced.  The == operator on 
>> object references still has a clearly defined meaning, and it is the 
>> intended meaning.  It may be unfortunate that the "good" name was 
>> taken by the "less common" operation, but programming languages are 
>> full of such things, and one can easily identify such things in each 
>> of the other 19 languages you list.  Ultimately, when there are two 
>> ways to do something (such as identity comparison and state 
>> comparison), someone has to choose which one gets which name, and 
>> sometimes someone doesn't agree with that choice.
>>
>> In the future, when Project Valhalla delivers value types, which are 
>> classes whose instances have no object identity, the == operator will 
>> compare these objects by their state, not their identity (since they 
>> have none.)  But even this would not obviate the need for 
>> Object::equals, since there are many classes that are suitable to be 
>> value types (such as Rational) where multiple distinct 
>> representations (e.g., 1/2 and 2/4) are mathematically equal.  So 
>> even there, we need different ways to spell "same object" and 
>> "equivalent value".
>>
>> In the farther future, if Java ever has operator overloading, one 
>> might be able to overload `==`, but being able to do that brings its 
>> own set of problems and confusions.
>>
>> Which is to say, there really are two questions here, "same object" 
>> and "domain equivalence", and you need ways to ask both.
>>
>>
>>
>>
>> On 10/22/2023 3:29 PM, David Alayachew wrote:
>>> Hello,
>>>
>>> Thank you for reaching out!
>>>
>>> I'm pretty sure that the amber-dev mailing list is not the correct 
>>> place for this type of question. This topic usually goes on at the 
>>> following mailing list instead. I've CC'd it for you. I would also 
>>> encourage you to remove amber-dev from your CC when responding to 
>>> me, or anyone else on this thread.
>>>
>>> discuss at openjdk.org
>>>
>>> To answer your question, this is a very common request, and the 
>>> biggest answer is definitely still the backwards compatibility 
>>> problem. But tbh, the question I have for you is this -- is it such 
>>> a big cost to call the o1.equals(o2) method instead of using ==? And 
>>> if you want to handle nulls too, you can import java.util.Objects 
>>> (that class is full of useful static utility methods) and then just 
>>> say Objects.equals(o1, o2) instead. I am pretty sure that that exact 
>>> method was created in response to your exact question.
>>>
>>> I understand it might be inconvenient, but making a change like you 
>>> suggested would be very disruptive for very little benefit. All you 
>>> would gain from doing this would be a slightly better syntax for 
>>> representing object equality and a little more ease when it comes to 
>>> teaching somebody Java. Is that really worth the effort?
>>>
>>> As for the class-file api, I'll CC them so that someone can fact 
>>> check me. Assuming I'm not wrong (no one responds to that point 
>>> specifically), I would also drop that mailing list from your CC when 
>>> responding.
>>>
>>> The purpose of the Class-File API was to build and transform class 
>>> files. So that seems unrelated to what you want. You want to 
>>> repurpose old syntax, but syntax stops being relevant after 
>>> compilation, and it is these compiled class files that the 
>>> Class-File API deals in. If we tried to use that API to handle class 
>>> files created with the old syntax, then we would have a migration 
>>> and clarity problem, amongst much more.
>>>
>>> Let us know if you have any more questions.
>>>
>>> Thank you for your time!
>>> David Alayachew
>>>
>>>
>>> On Sun, Oct 22, 2023 at 2:12 PM tzengshinfu <tzengshinfu at gmail.com> 
>>> wrote:
>>>
>>>     Hi, folks:
>>>
>>>     When I switched my primary programming language from C# to Java, I
>>>     found myself perplexed by 'string comparison' (and still do at
>>>     times). While string comparisons can sometimes become quite
>>>     intricate, involving issues like case sensitivity, cultural
>>>     nuances... most of the time, all that's needed is string1 == 
>>> string2.
>>>
>>>     I discovered that a similar question was asked a decade ago
>>> (https://urldefense.com/v3/__https://www.reddit.com/r/java/comments/1gjwpu/will_the_equals_operator_ever_be_fixed_with/__;!!ACWV5N9M2RV99hQ!NMbrc-pVC7Fix0fznwtzWbOW7c0MPb0ip-0s0pQQTbroMgFLJHOYeM2Ivmn0M7z-TdVpjJXT-JW6WDo$ 
>>> ), with responses indicating that it's due to 'Backward 
>>> compatibility,' and therefore, unlikely to change. (Backward 
>>> compatibility! We just keep piling new things on top of historical 
>>> baggage, and for users coming from school or from other languages 
>>> like C#, Python, C++, Rust, Golang, Kotlin, Scala, JavaScript, PHP, 
>>> Rlang, Swift, Ruby, Dart... the top 20 languages according to PYPL, 
>>> having to consult the so-called 'Java FAQ' can be frustrating.
>>>
>>>     But I believe that if something is amiss, it should be corrected
>>>     to keep moving forward. It would be fantastic if this issue could
>>>     be addressed in a new version of Java and an automatic conversion
>>>     feature provided to fix places in user code that use
>>>     String.equals. (Similar to the JVM's preview feature switch) Is
>>>     the Class-File API a potential solution to this problem? Is my
>>>     idea unrealistic?
>>>
>>>     /* GET BETTER EVERY DAY */
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/discuss/attachments/20231023/603fe321/attachment-0001.htm>


More information about the discuss mailing list