One-sided instantiation (diamond, collection literals)

Mon Nov 2 14:27:27 PST 2009

I've watched the recent diamond and collection literal threads with interest, but also with a certain bemusement; I believe there is a remarkably simple language change which is entirely free of ambiguity, complexity and (maybe even!) controversy, which elegantly negates the need for either in pursuit of a more concise yet clear java language.

In either case, the principal motivation is the desire to be able to declare and instantiate an object without unnecessary verbosity, and to improve the clarity of such statements.

Let me propose a simple language change which achieves this not only when declaring collections and generically defined objects, but for all cases where an object is declared and instantiated.

Any statement of the form

Foo foo=new Foo(myBar, myBaz);

can be replaced by

Foo foo(myBar, myBaz); 

With this simple change (which, like most language changes, will look a little odd the first few times you see it, but will very quickly become natural), plus the addition of varargs arguments to the major collections (ArrayList, LinkedList, HashSet, TreeSet and LinkedHashSet is probably enough), we have concise and unambigous collection instantiation, with or without generics.

ArrayList<String> myList(myOtherStringList); // uses #ArrayList(Collection) ctr

LinkedHashSet<String> mySet("one", "two", "three"); // uses varargs ctr

MyNonGenericClassWithLongName myObject(someArgument); // improves readability and conciseness for any instantiating declaration

There is no inference necessary regarding the type of either the LHS *or* RHS, because there *is* no LHS or LHS ("one-sided instantiation").

I can foresee some objections that may be raised here ...

(i) I can't declare using the interface/base class but instantiate with a subclass.

Good. If its a TreeSet, say its a TreeSet. Thankfully, Java isn't Groovy or Scala, and in Java-land we think it is as important to say what we mean as it is to mean what we say. Obviously, in API declarations like Collections.sort(X), X should be the most generally applicable interface or base class. Not so for a local variable declaration or field declaration - the author of the code knows which (e.g.) Set subclass is actually used, and should state so clearly - the reader of the code then also knows, from the get-go, the characteristics of the subclass in use.

The cases where it is actually useful to say

Map<T, U> map=new HashMap<T, U>();
// (do something)
map=new TreeSet<T, U>();

(or anything similar) are incredibly rare, in my experience, and where you feel the need to write code like this, then you can still do so - just not using one-sided instantiation.

(ii) I can't pass a collection literal straight into a method without using new-varargs or a generic static method.

Again, I say good.

foo(Immutable.list("one", "two", "three"));

or 

foo(new TreeSet("x", "y", "z"))

are admittedly more verbose than

foo({"x", "y", "z"}) 

or 

foo(["a", "b", "c"])

but elide any of the ambiguity (already discussed at length in this list) regarding the exact type of collection. Again, don't just mean what you say, say what you mean.

I don't think that having to clearly state the type of something which you pass to a method annoys or inhibits the productivity "Joe Java", whereas I do think that stating everything twice, as in 

HashMap<Foo, ArrayList<Bar, Baz>> map=new HashMap<Foo, ArrayList<Bar, Baz>(...);

does.

(iii) I can't use diamond to infer the type arguments in a method call.

True, the proposal does not enable one to say

foo(new ArrayList<>(bar, baz));

and have the compiler perform some elaborate inference algorithm (with all the pitfalls which entail, see many mails to this list) to determine the type argument. It does allow one to say

ArrayList<Widget> widgets(bar, baz);
foo(widgets);

I think, again, it is preferable to say what you mean than for the compiler to infer what you meant. The next developer who reads your code is not a compiler. 

(iv) In C++, that notation means "instantiate on the stack", not "instantiate on the heap".

So what. I think we long, long ago passed the point where familiarity to/acceptance by C or C++ programmers was a language design goal. All objects go on the heap in Java, so no change there.

(v) It doesn't address Map literals at all.

True. I'm really not sold on the need for them, personally, but my approach doesn't *preclude* using some special syntax for Map.Entry (though I would be against it), e.g.

HashMap<String, Integer> wordCounts("the":25, "of":16, "and":12);   

So one simple change makes *all* combined declare-and-instantiate statements as concise (in fact more concise) as in Groovy/Scala/whatever, whilst at the same time retaining/strengthening Java's strong typing, *and* maintaining Java's "blue collar" philosophy: that readability and maintainability by ones colleagues are more important than saving keystrokes.

A huge benefit of this proposal is that the change to the compiler code is almost trivial. I've only recently started hacking javac, but even I was able to figure out what needed to be changed, and where, in about an hour. In Parser#variableDeclaratorRest(...), replace

        // ...
        if (S.token() == EQ) {
            S.nextToken();
            init = variableInitializer();
        }
        else if (reqInit) // ...    

with

        // ...
        if (S.token() == EQ) {
            S.nextToken();
            init = variableInitializer();
        }
        // BEGIN CHANGE
        else if (S.token() == LPAREN || S.token() == LT) {
            List<JCExpression> typeArguments = null;
            if (S.token() == LT) {
                // Explicit type args on constructor.
                // Extremely rarely used, replaces e.g. Foo foo=new <String>Foo();
                // with Foo foo<String>();
                typeArguments=typeArguments();
            }
            JCExpression newClass = classCreatorRest(S.pos(), null, typeArguments, type);
            init = newClass;
        }
        // END CHANGE
        else if (reqInit) // ...

(plus of course the trivial changes required to add varargs ctrs to the principal collections).

It seems to me that this seven line compiler change does as much or more for Java's conciseness and readability as diamond and collection literals put together. 

And it doesn't (so far as I can see) introduce any "gotchas", nor require "Joe Java" to wrangle with mysterious synthetic collection types or wonder what three pages of JLS gobbledegook about generic type inference actually means.

I realize that the official window for suggestions for Java 7 language enhancements passed some time ago. But having followed the debate on apparently the two most contentious proposals and noted a simple alternative to both which I think is far preferable, I thought it was appropriate to post it here.

(PS. My apologies if this idea has already been discussed here before I started watching the list and dismissed for various reasons, in which case please direct me to that discussion.) 

Barney

_________________________________________________________________
Be one of the first to try Windows Live Mail.
http://ideas.live.com/programpage.aspx?versionId=5d21c51a-b161-4314-9b0e-4911fb2b2e6d