Is there a possibility of the string equality operator (==) being fixed?

Andrew Myers acm22 at cornell.edu
Tue Oct 24 02:43:51 UTC 2023


Pedagogically, it would be nice to have some sugar for .equals() that 
makes it remotely close in visual appeal to ==. Ideally it would be = 
but that is already ruined. My best alternative suggestion would be

e1 .= e2

Cheers,

-- Andrew

On 10/23/23 10:37 AM, Brian Goetz wrote:
> One of the pleasant side-effects of Project Valhalla is that (a) we'll 
> be able to say .equals() on primitives, and (b) the cost of doing so 
> will JIT down to that of ==.  Which means that we can tell people 
> "just use .equals() everywhere" (except when implementing low-level 
> code like IdentityHashMap) and they will never have to wonder which to 
> use.  I realize this doesn't solve the "wrong op got the good name" 
> problem, but it gives us a path to not having to think about it so often.
>
> It may be possible to migrate String to a value class at some point in 
> the distant future (though this has a considerable shroud of 
> uncertainty surrounding it), which makes this problem recede farther 
> into the background for the particular case of String.  Which might be 
> enough to make this significantly less of a problem, for the reasons 
> you outline here.
>
>
>
> On 10/23/2023 5:32 AM, Andrew Dinn wrote:
>> Hi Brian,
>>
>> I think there is also another subtle confusion lying behind this 
>> question. Strings are in the unusual position that they can be named 
>> via program literal text e.g. The 7 character sequence "hello" in a 
>> program body is a literal reference to an instance of java.lang.String.
>>
>> That is not the case for any other class of object bar one, the 
>> exception being instances of java.lang.Class e.g. the 22 character 
>> sequence java.lang.String.class in a program body serves as a literal 
>> reference to an instance of java.lang.Class.
>>
>> It is easy for a novice programmer to draw the conclusion that this 
>> literal reference must exist 1-1 with regard to its corresponding 
>> literal i.e. that there will only ever be one String whose ordered 
>> sequence of characters will be 'h', 'e', ''l', 'l' and 'o'. The fact 
>> that
>>
>>   new String("hello") == "hello"
>>
>> will evaluate to false is not immediately evident to beginner 
>> programmers.
>>
>> This misnomer is helped along by the fact that the JVM ensures that 
>> all occurrences of a String literal in disparate class files do end 
>> up referring to the same String instance. If method m of class C 
>> passes the literal String "hello" to method m2 of class C2 and the 
>> latter compares its input to the literal String "hello" using an 
>> equality comparison then the result will be true.
>>
>> class C
>> {
>>     . . .
>>     void m() {
>>         C2.m2("foo");
>>     }
>> }
>>
>> class C2
>> {
>>     . . .
>>     static void m2(String s) {
>>         if (s == "foo") {
>>             System.out.println("identity equal");
>>         }
>>     }
>> }
>>
>> The message printout will always be triggered. i.e. Strings mentioned 
>> as program literals in source code and thereby introduced as String 
>> constants in  bytecode are *deduplicated* to the same String instance 
>> when the bytecode is loaded by the JVM.
>>
>> That is why it takes some work to arrive at a case like the first 
>> code snippet above where two Strings can have equal state but not 
>> equal identity. At least one of the String instances has to be 
>> explicitly created at runtime via new, substring() or some other 
>> method that synthesises a String object.
>>
>> It is interesting to compare this situation with that for class 
>> literals where deduplication is either not required or would be 
>> incorrect. It requires a much greater feat of ingenuity (equally, 
>> carelessness or recklessness), involving the use of 
>> application-defined class loaders, to arrive at a situation where the 
>> program literal org.my.Foo.class occurring in a method m of class C 
>> can identify a different instance of java.lang.Class to the same 
>> program literal occurring in a method m2 of class C2. Yet it is 
>> possible:
>>
>> class C
>> {
>>     . . .
>>     void m() {
>>         C2.m2(org.my.Foo.class);
>>     }
>> }
>>
>> class C2
>> {
>>     . . .
>>     static void m2(Class<?> c) {
>>         if (s != org.my.Foo.class) {
>>             // Yes, if you misbehave you can end up here!
>>             System.out.println("you are in classloader hell!");
>>         }
>>     }
>> }
>>
>> I guess I could offer instructions as to how to arrive at the 
>> situation where m2 prints out its warning message but I'll leave that 
>> as an exercise for the expert (or unwary) reader.
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>>
>> On 22/10/2023 22:29, Brian Goetz wrote:
>>> First of all, the question is framed in a way that assumes its own 
>>> conclusion; that somehow there is something "broken" to be "fixed". 
>>> The == operator on object references asks a simple, well-defined, 
>>> fundamental question: do these two object references _refer to the 
>>> same object_. There is a similar, related question of "do these two 
>>> objects _encode the same domain value_" (which is inherently 
>>> class-specific), and that goes by the name of the "equals" method. 
>>> These are two different questions, and it is important to be able to 
>>> ask each.  One does not replace the other.
>>>
>>> The presumption that something is "broken" comes from the subjective 
>>> perception that the "less important" operation got the "better" 
>>> name.  Indeed, without a clear understanding of what these two 
>>> questions are, it is easy to make mistakes. The comparison to C# 
>>> illustrates that other languages could make other choices, which 
>>> might result in a different category of mistakes that users might or 
>>> might not make.
>>>
>>> While the answer you got said "backward compatibility", this is a 
>>> too-simplistic (though often repeated) answer; the answer really is 
>>> "because this exactly is how the language was designed to work", 
>>> which means this is not something to be "fixed".  If we agreed that 
>>> this original intention was wrong-headed, then the issue of 
>>> compatibility would come in -- that there are billions of lines of 
>>> code that have been written in Java, and turning Java into Java++, 
>>> whether "better" or not, would break many of them.  (Sometimes 
>>> language do make incompatible changes because something is so 
>>> egregiously broken that it is better to break half the world's code 
>>> than continue living with it, but the bar for this is extremely 
>>> high, and "I wish the other operation got the good name" doesn't 
>>> come near it.)
>>>
>>> But the eye-rolling of "how much are we going to sacrifice at the 
>>> altar of backward compatibility" is misplaced.  The == operator on 
>>> object references still has a clearly defined meaning, and it is the 
>>> intended meaning.  It may be unfortunate that the "good" name was 
>>> taken by the "less common" operation, but programming languages are 
>>> full of such things, and one can easily identify such things in each 
>>> of the other 19 languages you list.  Ultimately, when there are two 
>>> ways to do something (such as identity comparison and state 
>>> comparison), someone has to choose which one gets which name, and 
>>> sometimes someone doesn't agree with that choice.
>>>
>>> In the future, when Project Valhalla delivers value types, which are 
>>> classes whose instances have no object identity, the == operator 
>>> will compare these objects by their state, not their identity (since 
>>> they have none.)  But even this would not obviate the need for 
>>> Object::equals, since there are many classes that are suitable to be 
>>> value types (such as Rational) where multiple distinct 
>>> representations (e.g., 1/2 and 2/4) are mathematically equal.  So 
>>> even there, we need different ways to spell "same object" and 
>>> "equivalent value".
>>>
>>> In the farther future, if Java ever has operator overloading, one 
>>> might be able to overload `==`, but being able to do that brings its 
>>> own set of problems and confusions.
>>>
>>> Which is to say, there really are two questions here, "same object" 
>>> and "domain equivalence", and you need ways to ask both.
>>>
>>>
>>>
>>>
>>> On 10/22/2023 3:29 PM, David Alayachew wrote:
>>>> Hello,
>>>>
>>>> Thank you for reaching out!
>>>>
>>>> I'm pretty sure that the amber-dev mailing list is not the correct 
>>>> place for this type of question. This topic usually goes on at the 
>>>> following mailing list instead. I've CC'd it for you. I would also 
>>>> encourage you to remove amber-dev from your CC when responding to 
>>>> me, or anyone else on this thread.
>>>>
>>>> discuss at openjdk.org
>>>>
>>>> To answer your question, this is a very common request, and the 
>>>> biggest answer is definitely still the backwards compatibility 
>>>> problem. But tbh, the question I have for you is this -- is it such 
>>>> a big cost to call the o1.equals(o2) method instead of using ==? 
>>>> And if you want to handle nulls too, you can import 
>>>> java.util.Objects (that class is full of useful static utility 
>>>> methods) and then just say Objects.equals(o1, o2) instead. I am 
>>>> pretty sure that that exact method was created in response to your 
>>>> exact question.
>>>>
>>>> I understand it might be inconvenient, but making a change like you 
>>>> suggested would be very disruptive for very little benefit. All you 
>>>> would gain from doing this would be a slightly better syntax for 
>>>> representing object equality and a little more ease when it comes 
>>>> to teaching somebody Java. Is that really worth the effort?
>>>>
>>>> As for the class-file api, I'll CC them so that someone can fact 
>>>> check me. Assuming I'm not wrong (no one responds to that point 
>>>> specifically), I would also drop that mailing list from your CC 
>>>> when responding.
>>>>
>>>> The purpose of the Class-File API was to build and transform class 
>>>> files. So that seems unrelated to what you want. You want to 
>>>> repurpose old syntax, but syntax stops being relevant after 
>>>> compilation, and it is these compiled class files that the 
>>>> Class-File API deals in. If we tried to use that API to handle 
>>>> class files created with the old syntax, then we would have a 
>>>> migration and clarity problem, amongst much more.
>>>>
>>>> Let us know if you have any more questions.
>>>>
>>>> Thank you for your time!
>>>> David Alayachew
>>>>
>>>>
>>>> On Sun, Oct 22, 2023 at 2:12 PM tzengshinfu <tzengshinfu at gmail.com> 
>>>> wrote:
>>>>
>>>>     Hi, folks:
>>>>
>>>>     When I switched my primary programming language from C# to Java, I
>>>>     found myself perplexed by 'string comparison' (and still do at
>>>>     times). While string comparisons can sometimes become quite
>>>>     intricate, involving issues like case sensitivity, cultural
>>>>     nuances... most of the time, all that's needed is string1 == 
>>>> string2.
>>>>
>>>>     I discovered that a similar question was asked a decade ago
>>>> (https://urldefense.com/v3/__https://www.reddit.com/r/java/comments/1gjwpu/will_the_equals_operator_ever_be_fixed_with/__;!!ACWV5N9M2RV99hQ!NMbrc-pVC7Fix0fznwtzWbOW7c0MPb0ip-0s0pQQTbroMgFLJHOYeM2Ivmn0M7z-TdVpjJXT-JW6WDo$ 
>>>> ), with responses indicating that it's due to 'Backward 
>>>> compatibility,' and therefore, unlikely to change. (Backward 
>>>> compatibility! We just keep piling new things on top of historical 
>>>> baggage, and for users coming from school or from other languages 
>>>> like C#, Python, C++, Rust, Golang, Kotlin, Scala, JavaScript, PHP, 
>>>> Rlang, Swift, Ruby, Dart... the top 20 languages according to PYPL, 
>>>> having to consult the so-called 'Java FAQ' can be frustrating.
>>>>
>>>>     But I believe that if something is amiss, it should be corrected
>>>>     to keep moving forward. It would be fantastic if this issue could
>>>>     be addressed in a new version of Java and an automatic conversion
>>>>     feature provided to fix places in user code that use
>>>>     String.equals. (Similar to the JVM's preview feature switch) Is
>>>>     the Class-File API a potential solution to this problem? Is my
>>>>     idea unrealistic?
>>>>
>>>>     /* GET BETTER EVERY DAY */
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/discuss/attachments/20231023/5da6f257/attachment-0001.htm>


More information about the discuss mailing list