list literal gotcha and suggestion

Derek Foster vapor1 at teleport.com
Thu Oct 8 01:58:28 PDT 2009


I've been watching this conversation fly by with interest. A few points:

First of all, I do use set literals from time to time. I want there to be a convenient and reasonably concise syntax for them. Having to use an 'import' statement is not, IMHO, an acceptable option. I would vastly prefer any of these:

   {1, 2, 3}
   [1, 2, 3]
   {1,2,3}.toSet()

to this:

   import java.util.Collections;
   Collections.unmodifiableSetOf(1, 2, 3);

or this:

   import java.util.HashSet;
   HashSet.of(1, 2, 3);

or even worse, this:

   import java.util.Collections;
   import java.util.HashSet;
   Collections.unmodifiableSet(HashSet.of(1, 2, 3));

That's just gross. Perl, Ruby and Python programmers would be laughing at us for such verbose syntax to do such a simple thing. (Well, they probably do anyway, but even more so.) Having to deal with adding import statements and long expressions involving multiple method calls just to create simple constant values of classes that are for all practical purposes built into the language seems to me to be a HUGE annoyance.

The whole point of this proposal is to add syntactic sugar and make common patterns easier to use. The resulting code needs to be CONCISE. Or people simply won't use it. I hardly ever see people use "Collections.unmodifiableSet()" in the first place, just because it's too long to type (plus the hassle of importing 'Collections'), and thus most programmers I have met just don't consider the benefits of using to be worth the trouble.

Overall, I like Neal's suggestion of the magic compiler type, because it leads to very simple syntax. It also has some potential for some hidden compiler magic that could also be used for some other interesting cases which have not yet been brought up in this discussion. For instance, the compiler could decide based on the context and the type of the elements, whether to convert the magic type into any of the following:

    class ImmutableList extends List<T> { ... }
    class ImmutableSet extends Set<T> { ... }
    class ImmutableMap extends Map<K, V> { ... }
    class ImmutableCollection extends Collection<K, V> { ... }
    T[]  // Preferably, immutable using compiler magic in the same way as E.values() is for enum types
    class ImmutableEnumSet extends EnumSet<T> { ... } // For enum types only, O(1) lookup
    class ImmutableEnumMap extends EnumMap<K, V> { ... } // For enum types only, O(1) lookup
    class ImmutableSortedSet extends SortedSet<T> { ... } // Optimal balanced tree constructed at compile time?
    class ImmutableSortedMap extends SortedMap<T> { ... } // Optimal balanced tree constructed at compile time?

Conceivably, the compiler might even be smart enough to perform more sophisticated optimizations based on the number of elements. For instance, if five or fewer elements existed, it might create a Set<T> implementation that did a linear search over an array, and for a greater number of elements, it might use a hash-based implementation. (Of course, both of these would have to be members of the standard library.)

The nice thing about all of these is that the user doesn't need to specify. The compiler can choose the most efficient implementation, while the user still gets to use a simple syntax and ignore this complexity.

The possibility of conversion to an array is particularly interesting. This would mean that the following existing legal syntax:

     Integer[] foo = {1, 2, 3};

could be retroactively interpreted to simply be a special case of a more general syntax that would also allow uses like this:

    void framulate(Integer[] xValues, Integer[yValues]);

    framulate({1,2,3}, {4,5,6});

It also might allow forms like this to be done much more efficiently than if a collection type were used:

    for (Integer x : {1,2,3}} {
       ... 
    }

if the context of a 'foreach' loop were understood to be an array context for the purposes of these conversions. (Otherwise, I think it should be a 'Collections' context.)

Another thing that I have not seen much discussion of is the limitation in the proposal that the constants used to create the literal must be compile-time constant. This seems like a severe limitation to me. I really REALLY want this limitation to go away. That is, I want this:

     void foo(int bar) {
        List<Integer> x = {1, 2, 3}; // All compile-time constant initializers
        List<Integer> y = {4, bar, 6}; // At least one initializer is not a constant
        for (Integer z : {7, 8, 9}) { // All compile-time constant initializers
            ...
        }
     }

to be desugared to something like this:

     private static final List<Integer> $INITIALIZER1 = ImmutableList.of( 1, 2, 3 );
     private static final Integer[] $INITIALIZER2 = {7, 8, 9};
     void foo(int bar) {
        List<Integer> x = $INITIALIZER1;
        List<Integer> y = ImmutableList.of(4, bar, 6);
        for (Integer z : $INITIALIZER2) {
            ...
        }
     }

so that I don't have to waste the CPU's (and the garbage collector's) time creating duplicate instances of the collections which won't ever change, every time this function is executed. Note that fancier desugarings might even get the internal data structures allocated at compile time so there is no need to iterate over elements when constructing these instances:

     private static final Integer[] $INITIALIZER1_CONTENTS = { 1, 2, 3 };
     private static final List<Integer> $INITIALIZER1 = ImmutableList.withContents($INITIALIZER1_CONTENTS);
     void foo(int bar) {
        List<Integer> x = $INITIALIZER1;
        ...
     }

Similar tricks could be used to pre-allocate hashtables or balanced trees for Map and Set instances.

Derek












More information about the coin-dev mailing list