list literal gotcha and suggestion

Reinier Zwitserloot reinier at zwitserloot.com
Mon Sep 28 23:14:28 PDT 2009


If we insist on having both short-hand set and list literals, then  
some people are neccessarily going to be confused about the syntax,  
regardless of {} being used for sets and [] being used for lists, or  
vice versa, for reasons already covered: [] is lists and {} is sets in  
other languages, but in java, {} is more closely associated with lists  
due to array literals. There's no answer out of this unless this  
answer involves either eliminating set literals, or forcing the need  
to mention the target type somehow: e.g. something like "Set["a", "b",  
"c"]" versus "List["a", "b", "c"]" which is quite a price to pay to  
eliminate the confusion. Perhaps if the literal syntax defaults to  
Lists if you omit the type, this is a workable alternative.  
Unfortunately, this is hard to rhyme with the parser: Is "Set[10]" to  
be interpreted as: Please create a Set<Integer>, and populate it with  
the number '10', or should it be interpreted as: "Set" is a variable  
that points to an array, and I want the 11th item in this array.  
(variables ought to start with a lowercase letter, but this is merely  
convention and not something the parser can rely on). So, such a  
change would probably move this proposal beyond the scope of coin.

So, let's turn this argument on its head: Why are we trying so hard to  
make set literals work? Why don't we just remove them from the  
proposal? The need for them seems minor compared to lists. When the  
collection size is small (below about a 100), O(1) lookup performance  
is irrelevant (and even if it was relevant, due to the extra  
housekeeping that Sets have to do, Lists tend to actually beat Sets in  
performance, even for contains(), if the list/set is small!), and yet,  
if the initial list or set is being created via a literal, the list/ 
set will most likely remain small. If only list literals existed,  
creating a set is much cleaner than what you get now:

new HashSet<>({"a", "b", "c"});

versus:

new HashSet<String>(Arrays.asList("a", "b", "c"));

This doesn't work if you reverse the scenario; you can't make lists  
from set literals (as the duplicates would have already been removed  
at thiat point). Even in this longer form of set literal, you've  
eliminated the biggest problem in the status quo: A reliance on a  
completely unrelated class (Arrays - what does the arrays utility  
class have to do with the creation of collections from explicit  
values? It SHOULD have no relation whatsoever), and extremely  
wordiness - partly because of the accepted diamond proposal.

Perhaps some research needs to be done for how often "set literals"  
are created now in real life java code. Search for the patterns:

new HashSet<T>(Arrays.asList(T...));
googlecollections.ImmutableSet.of(T...);
HashSet/Set<T> x = new HashSet<T>(); x.add(t); //repeated 1 or more  
times.

If not very often, then isn't the right answer here to just leave them  
out entirely, eliminating the confusion in the process?

I may have missed it, but I can't remember seeing the technical  
details on this proposal. What does a list literal construct? A  
mutable ArrayList, or an immutable undefined implementation of List? I  
would strongly suggest these literals are immutable by default,  
particularly because making them mutable is easy: new  
ArrayList<>({"a", "b", "c"});.

If only the static methods in interfaces proposal had been taken more  
seriously, this could have been solved decently with a library,  
especially because of the acceptance of the easier varargs invocation  
proposal:

List.of("a", "b", "c");

is even better than:

["a", "b", "c"];

because it avoids the "Is it a Set or a List" issue entirely, and  
doesn't require taking up valuable parser flexibility the way real  
literals would.

Even in complex situations:

List<Set<String>> complicated = List.of(Set.of("a", "b", "a"), null,  
Set.empty());

methodCall(List.of("a", "b, "c"));

compared to:

List<Set<String>> complicated = [{"a", "b", "a"}, null, {}];
methodCall(["a", "b", "c"]);

the interface-based static methods are probably slightly worse overall  
than true literals in these slightly more complex situations, but the  
difference isn't that big, and the static methods still win in  
eliminating list/set confusion. Right now the static method based  
solution would cause a warning you can safely ignore, and is thus  
useless (try it yourself with the google collections API's "of"  
methods), but as mentioned the simpler varargs invocation proposal  
eliminates the unneccessary warnings, making that form a decent  
alternative.

NB: I doubt it'll help at this point in time, but I'll vouch for  
writing up the JLS patch and a prototype compiler, delivered within a  
month after greenlighting the idea. The implementation would work  
along the lines of the proposal I handed in for coin to allow static  
methods in interfaces without requiring any changes to the JVM. The  
strategy boils down to creating a new inner class in the interface,  
named "$Methods", and moving all static methods into this new class,  
then requiring that these static methods are called only via their  
original class name, and not on instances or subtypes (Every style  
checker I know of generates a warning if you call static methods via  
an instance anyway, so I don't consider this a big loss), and  
translating any static method call on an interface type from  
InterfaceName.methodName(params) to InterfaceName. 
$Methods.methodName(params). As plenty of other languages running on  
the JVM do allow it, there's a modicum of benefit for JVM language  
interop if the proposal is accepted as well, by standardizing the  
approach.

  --Reinier Zwitserloot

On 2009/29/09, at 07:34, Joshua Bloch wrote:

> Paul,
>
> On Mon, Sep 28, 2009 at 10:28 PM, Paul Benedict  
> <pbenedict at apache.org>wrote:
>
>> Josh,
>>
>> I think using braces or brackets to indicate the correct type is  
>> hardly
>> intuitive or easy to remember. Choosing the wrong syntax by  
>> accident will
>> instantiate the wrong type, and the difference between the brace or  
>> bracket
>> is pretty subtle visually.
>
>
> Usually it won't compile: you can't assign a Set to a List or vice- 
> versa.
> Nick's example was carefully chosen: he invoked a constructor that  
> took a
> Collection, which admits either a Set or a List.
>
>
>
>> If Java developers have to begin saying, "Which syntax do I need to  
>> use for
>> a List vs. Set?", then I question the whole cost-to-benefit-ratio  
>> of this
>> "small" (i.e, coin) proposal.
>
>
> Agreed, I do think the syntax we settled on is reasonably evocative,
> memorable, and consistent with other languages.  Braces (AKA curly  
> braces)
> are used to represent a Set in mathematical notation, and square  
> brackets
> are used to index into and to declare arrays, which are list-like.
>
>
>> I can see the JDK 7 certification tests already asking this  
>> question --
>> it's a good gotcha question. Not being a language expert, and  
>> recognizing
>> that other languages already use what's being proposed, the syntax  
>> still
>> doesn't pass my common sense meter. Do the technical justifications  
>> really
>> outweigh simplicity?
>>
>
> I think it's probably the best that we can do, but I could be  
> wrong.  I will
> investigate other options.
>
>             Josh
>




More information about the coin-dev mailing list