PROPOSAL: Sameness operators (version 2)

Derek Foster vapor1 at teleport.com
Thu Apr 30 23:45:40 PDT 2009


Per discussion on the Project Coin mailing list, this proposal has been expanded somewhat to discuss the possibility of using alternate operator sets (using # or ~ instead of $), and some discussion has been added on the merits of defining $$ and !$ in terms of Object.equals() instead of Comparable<T>.compareTo(T).


Sameness Operators (Version 2)

AUTHOR:

Derek Foster

OVERVIEW

Many Java objects implement the Comparable<T> interface, and/or override the Object.equals() method, to provide a way of ordering their instances or detecting whether two instances are equivalent . However, the syntax for using these methods has historically been fairly ugly:

// We want to write "if a >= b", but we have to write...
if (a.compareTo(B) >= 0) {
     whatever();
}

// We want to write "if a == b", but we have to write...
if (a.equals(b)) {
     whatever();
}

The ugliness of these methods has often motivated the creation of special-purpose APIs to simplify the use of classes which implement them, such as the 'Date.before(Date)' and 'Date.after(Date)' methods in java.util.Date. However, these methods are inconsistent from class to class, and often not present.

Furthermore, the existing language == and != operators exhibit a strange assymetry between the behavior of objects and that of primitive types that often catches new users of Java by surprise:

int i;
if (i == 5) {
    // gets here if the VALUE of i is 5
}

String j = new String("abc");
if (j == "abc") {
    // probably never gets here! Comparing the values of the references, not the values of the strings.
}

String k = new String("abc");
if (k.equals("abc")) {
     // gets here if the VALUE of k is "abc".
}

This behavior often confuses newcomers to Java, and is a common source of bugs even for experienced programmers.

This proposal suggests that a new set of relational operators be added to Java which would simplify ordering classes by their declared orderings (as specified by the Comparable<T> class and Object.equals()), while still yielding syntax that is as simple as using the existing >=, <=, ==, and != operators.

FEATURE SUMMARY:

Adds four new operators to Java: 

a $$ b    "same as":              a==null ? b==null : a.equals(b), or a == b for primitive types.
a !$ b    "not same as":          a==null ? b!=null : !a.equals(b), or a != b for primitive types.
a >$ b    "greater than or same": a.compareTo(b) >= 0, or a >= b for primitive types.
a <$ b    "less than or same":    a.compareTo(b) <= 0, or a <= b for primitive types.

and adds additional overloadings to existing operators:
a < b     a.compareTo(b) < 0, or a < b for primitive types.
a > b     a.compareTo(b) > 0, or a > b for primitive types.

Note that this proposal specifies several alternatives to the specific operator names chosen ( for instance, the proposal could be implemented using ##, !#, >#, and <# instead of $$, !$, >$, and <$) if the use of '$' is deemed too problematic due to possible breaking changes to generated files. This is discussed in the "BREAKING CHANGES" section below.

MAJOR ADVANTAGE:

Use of new operators for these relational tests would simplify code and make it more clear what relational tests are being made, as well as reducing the opportunity for mistakes (such as accidentally typing a.compareTo(b) >=0 when <=0 was intended).

MAJOR BENEFIT:

Clearer code due to the use of infix operators instead of using method calls and an extra pair of parentheses, plus possible extra tests (often accidentally omitted) for nullness around calls to Object.equals(Object).

Future proposals involving limited uses of operator overloading for code clarity (for BigInteger, etc. classes) would no longer run into the iceberg of "but you can't change the behavior of == in a backwards-compatible fashion!"


MAJOR DISADVANTAGE:

Modifications to the compiler would be required. These do not appear particularly difficult, but would take some effort.

The !$ operator would not be available for Perl-style pattern matching, if that were ever added to Java.


ALTERNATIVES:

Keep using the Comparable.compareTo and Object.equals methods as they are.

It would have been better to define == and != this way in the first place, and use some other operator (perhaps === as in some other languages?) to indicate comparison by object identity. This would have made Java simpler and easier for newcomers to understand. However, that's not how the language is currently defined, and to change the behavior of == now would be a backwards incompatible change.

It might be possible to define $$ and !$ in terms of "Comparator<T>.compareTo(T)" instead of in terms of Object.equals(Object). However, doing so would have been less general purpose (it would only have worked on comparable classes). Although it is possible for these methods to disagree (for instance, Float's handling of Float.NaN), in practice this rarely occurs in the scenarios for which compareTo is typically used for determining well-ordering (sorting algorithms, etc.).

EXAMPLES

SIMPLE EXAMPLE:

String getOrdering(String first, String second) {
    if (first $$ second) {
        System.out.println("They are equal");
    } else if (first !$ second) {
        System.out.println("They are not equal");
    } else if (first > second) {
        System.out.println("The first is after the second");
    } else if (first >$ second) {
        System.out.println("The first is same as or after the second");
    } else if (first < second) {
        System.out.println("The first is before the second");
    } else if (first <$ second) {
        System.out.println("The first is before or the same as the second");
    }
}

ADVANCED EXAMPLE:

Really, the simple example pretty much illustrates the feature.


DETAILS

SPECIFICATION:


The following new tokens 

    $$ !$ >$ <$

shall be added to section 3.12 of the JLS3.

The expression grammar in section 15.20 shall be modified like so:

RelationalExpression:
        ShiftExpression
        RelationalExpression < ShiftExpression
        RelationalExpression > ShiftExpression
        RelationalExpression <= ShiftExpression
        RelationalExpression >= ShiftExpression
        RelationalExpression >$ ShiftExpression
        RelationalExpression <$ ShiftExpression
        RelationalExpression instanceof ReferenceType

The expression grammar in section 15.21 shall be modified like so:

    EqualityExpression:
            RelationalExpression
            EqualityExpression == RelationalExpression
            EqualityExpression != RelationalExpression
            EqualityExpression $$ RelationalExpression
            EqualityExpression !$ RelationalExpression

Semantically, the behavior of these new operators is as follows:


$$ and !$ operators:

Evaluation of these operators shall occur exactly as they do for the == and != operators, as specified in section 15.21 of the JLS3 ("Equality Operators"), with the exception that for the purposes of these operators, section 15.21.3 ("Reference equality operators == and !=") shall be disregarded and replaced with:

If the operands of a sameness operator are both of either reference type or the null type, then the operation is object equivalence. The behavior described below is for the $$ operator. The !$ operator shall behave identically except that it shall return false when the $$ operator would return true, and vice versa. The procedure for evaluating the $$ operator is as follows:

If both operands are null, then the result shall be 'true'.

Otherwise, if the left operand is null and the right operand is not null, then the result shall be 'false'.

Otherwise, the return value shall be the value of the expression "left.equals(right)", evaluated using the java.lang.Object.equals(Object) method.



>, >$, <, and <$ operators:

Evaluation of these operators shall occur exactly as it does for corresponding >, >=, <, and <= operators, in section 15.20.1 of the JLS3 ("Numerical Comparison Operators <, <=, >, and >=") with the exception of the following.

The text "The type of each of the operands of a numerical comparison operator must be a type that is convertible (§5.1.8) to a primitive numeric type, or a compile-time error occurs." shall be replaced with the text "If the type of both of the operands of a numerical comparison operator is a type that is convertible (§5.1.8) to a primitive numeric type, the following algorithm is used to evaluate the operator. Otherwise, the operation is object ordering (See §15.20.1.1)"

A new section 15.20.1.1 shall be added consisting of the folowing text:

If one or both operands of a relational operator cannot be converted to primitive numeric types, then boxing conversions shall be used to convert both operands to object types.

After this conversion, if the type of the left operand does not extend the raw type java.lang.Comparable, and also does not extend java.lang.Comparable<T> for some type T which is equal to or a supertype of the type of the right operand, then a compiler error shall be reported.

Otherwise, the result of the comparison shall be evaluated at runtime as follows:

If the left operand is null, a NullPointerException shall be thrown.

Otherwise, the method left.compareTo(right) shall be called. The following table shall be used to determine the result of the operator evaluation, based on the value returned from this method:

         left.compareTo(right)
          <0      ==0     >0
    >     false   false  true
    <     true    false  false
    >$    false   true   true
    <$    true    true   false


COMPILATION: 

Compilation of the example given above would be desugared as follows:

    if (first==null ? second==null : first.equals(second)) {
        System.out.println("They are equal");
    } else if (first==null ? second != null : !first.equals(second)) {
        System.out.println("They are not equal");
    } else if (first.compareTo(second) > 0) {
        System.out.println("The first is after the second");
    } else if (first.compareTo(second) >= 0) {
        System.out.println("The first is same as or after the second");
    } else if (first.compareTo(second) < 0) {
        System.out.println("The first is before the second");
    } else if (first.compareTo(second) <= 0) {
        System.out.println("The first is before or the same as the second");
    }


TESTING:

The feature can be tested by ensuring, first of all that the new operators return the same results as existing operators when invoked on operands which are both convertible to numeric primitive types.

Secondly, that the new operators return the same results as their desugaring equivalents when invoked in circumstances where one or both operands are not convertible to numeric primitive types.

Thirdly, that the relational operators throw NullPointerExceptions when the leftmost operation is null.

Fourthly, that compiler errors are reported exactly in the cases when the desugared equivalents of the operators would not compile.

LIBRARY SUPPORT:

No library changes are needed for this feature.

REFLECTIVE APIS:

No changes to reflective APIs are needed for this feature.

OTHER CHANGES:

No other parts of the platform need to be updated.

MIGRATION:

A tool such as Eclipe or IntelliJ Idea might identify for the users existing calls to .compareTo or .equals which could be converted to use the new operators, and could offer to perform the change automatically at the user's request. Or a user could simply refactor code to use the new operators as desired.

COMPATIBILITY

BREAKING CHANGES:

The default set of operators ($$, !$, >$, and <$) suggested in this proposal were chosen because the "$" has an easy mnemonic meaning of "sameness". However, there are some rare circumstances in which a valid program that uses the $ operator within identifiers (typically only done in machine-generated code) might become invalid according to this proposal.

Specifically, a program which, in source code, used the $ character at the start of an identifier might not compile correctly, if it contained code such as:

if (a<$something) {
}

meaning "if ( a < $something )" rather than "if ( a <$ something )" which is how it would be parsed according to this proposal. This would almost certainly result in a compiler error about a missing variable "something", particularly if the body of generated code was large, so the odds of this resulting in correctly compiling but silently misinterpreted code is small.

This would only happen in generated code (the only place a $ character is supposed to be used, according to the JLS), and is expected to be an extremely rare phenomenon, since generated code rarely starts identifiers with dollar signs, and also usually generates code with spaces around operators for readability. This problem can be easily fixed when upgrading to Java 7 by altering the code generator to put a space in front of any such identifiers, either globally or only when they follow a < or > character. Note that internal synthetic variables generated by a compiler in a class file would be immune from this problem since they would never appear in source code form.

Alternately, this proposal could be altered to use another character instead of $ for its tokens. One good candidate for this would be "~" which lends itself to being read as "is equivalent to" (rather than "$" for "is the same as"). There are, however, rare cases that this could break as well ( if (a<~b) { ... } ) and although relational tests against negated values are quite rare (since combining bitwise operations and relationals is usually a nonsensical operation), unfortunately this breakage could occur in non-generated code, which is much more common than generated code.

Yet another option would be "#", which somewhat resembles an equals sign. Thus, instead of the $$, !$, <$, and >$ operators, this proposal could be implemented to use the ##, !#, <#, and ># operators. The # operator has the advantage of being unused in Java, and so is guaranteed not to break code. However, it is highly sought after by writers of language change proposals (such as closures, method pointers, and others), and so is under heavy competition as to its future meaning. The #-related operators would certainly be the safest set of operators to use to implement this proposal, although some care might be required to ensure that this use would not interfere with some other future desired meaning of these operators. Fortunately, most proposals for using the "#" in code have attempted to use it as an infix operator with identifiers on either side rather than as a prefix operator, which reduces the risk of conflicting with this proposal. Also, if the above symbols were defined as tokens before any such code exists, it should be easy to ensure that any future use of the # character simply never falls into the pattern of one of these tokens.


EXISTING PROGRAMS:

Except for the minor, rare, breaking change listed above, this change should not create incompatibilities with existing source or class files.

REFERENCES

EXISTING BUGS:

There are a variety of proposals in the Bug Database related to various people's desires to support operator overloading for certain built-in mathematical classes such as BigInteger. A couple of such proposals are listed below.

"Add [], -, +, *, /  operators to core classes as appropriate" (related)
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5099780

"BigInteger should support autoboxing"
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6407464

Also, see discussion on the Project Coin mailing list regarding the proposal "Draft proposal: allow the use of relational operators on Comparable classes," particularly with regards to why that proposal was withdrawn (namely, inability to make the == and != operators work properly on Comparable classes).
http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/000361.html


URL FOR PROTOTYPE (optional):

None.




More information about the coin-dev mailing list