RFC: draft API for JEP 269 Convenience Collection Factories

Stuart Marks stuart.marks at oracle.com
Thu Oct 8 23:39:33 UTC 2015


Hi all,

Please review and comment on this draft API for JEP 269, Convenience Collection 
Factories. For this review I'd like to focus on the API, and set aside 
implementation issues and discussion for later.


JEP:

	http://openjdk.java.net/jeps/269

javadoc:

	http://cr.openjdk.java.net/~smarks/reviews/jep269/api.20151008.mod/

specdiff:

	http://cr.openjdk.java.net/~smarks/reviews/jep269/api.20151008.specdiff/overview-summary.html


Most of the API is pretty straightforward, with fixed-arg and varargs "of()" 
factories for List, Set, ArrayList, and HashSet; and with fixed-arg "of()" 
factories and varargs "ofEntries()" factories for Map and HashMap.

There are a few issues on which I'd like to solicit discussion.

1. Number of fixed arg overloads.

I've somewhat arbitrarily provided up to 5 fixed-arg overloads for the lists and 
sets, and up to 8 pairs for the fixed-arg map factories. The rationale for 8 
pairs is that there are 8 primitives, and various language processing tools 
often have maps for the primitive types. (But such tools also often need to 
handle the Void type, which exceeds the limit of 8. So this might need to change 
if we want to follow this rationale.)

I also note that Guava's immutable factories provide 11 fixed-arg overloads for 
list, 5 for set, and 5 pairs for map. I'd be curious as to the rationale for 
this, and whether it also would apply to the JDK.

2. Other concrete collection factories.

I've chosen to provide factories for the concrete collections ArrayList, 
HashSet, and HashMap, since those seem to be the most commonly used. Is there a 
need to provide factories for other concrete collections, such as LinkedHashMap?

3. Duplicate handling.

My current thinking is for the Set and Map factories to throw 
IllegalArgumentException if a duplicate element or key is detected. The current 
draft specification is silent on this point. It needs to be specified, one way 
or another.

The rationale for throwing an exception is that if these factories are used in a 
"literal like" fashion, then having a duplicate is almost certainly a 
programming error. Consider this example:

     Map<String,TypeUse> m = Map.ofEntries(
         entry("CDATA",       CBuiltinLeafInfo.NORMALIZED_STRING),
         entry("ENTITY",      CBuiltinLeafInfo.TOKEN),
         entry("ENTITIES",    CBuiltinLeafInfo.STRING.makeCollection()),
         entry("ENUMERATION", CBuiltinLeafInfo.STRING.makeCollection()),
         entry("NMTOKEN",     CBuiltinLeafInfo.TOKEN),
         entry("NMTOKENS",    CBuiltinLeafInfo.STRING.makeCollection()),
         entry("ID",          CBuiltinLeafInfo.ID),
         entry("IDREF",       CBuiltinLeafInfo.IDREF),
         entry("IDREFS",
                   TypeUseFactory.makeCollection(CBuiltinLeafInfo.IDREF));
         entry("ENUMERATION", CBuiltinLeafInfo.TOKEN));

(derived from [1])

If duplicates were silently ignored, this might result in hard-to-spot errors.

There's also the matter of which value ends up being used in the case of 
duplicate map keys, and whether this should be specified. A fairly obvious 
policy would be "last one wins" but I'm reluctant to specify that, as it starts 
to place unnecessary constraints on implementations. However, the alternative of 
leaving it unspecified is also unpalatable.

I'm aware that very few programming systems with similar constructs will signal 
an error on duplicate elements. Python, Ruby, Groovy, Scala, and Perl all seem 
to allow duplicates in maps or equivalent, apparently with a last-wins policy. 
(Though sometimes it's hard to tell if the policy is specified.)

The only system I've been able to find that explicitly rejects duplicates is 
Clojure, and this policy isn't without controversy. [2] The main rationale is to 
prevent programming errors.

There is a python bug [3] where it was proposed that duplicates in a dict should 
raise an error or warning, also in order to catch programming errors. The 
request was rejected, not necessarily because it was a bad idea, but primarily 
because it would be a backward incompatible change.

The easiest thing to do would simply to require last-wins, since "everybody else 
is doing it" ... but that doesn't mean it's right. Since we're introducing a new 
API here, there is no compatibility issue. Throwing an exception for duplicates 
seems like a good way to prevent a certain class of programming errors.

What do people think?

s'marks

[1] 
http://hg.openjdk.java.net/jdk8/jdk8/jaxws/file/d03dd22762db/src/share/jaxws_classes/com/sun/tools/internal/xjc/reader/dtd/TDTDReader.java#l420

[2] http://dev.clojure.org/display/design/Allow+duplicate+map+keys+and+set+elements

[3] https://bugs.python.org/issue16385




More information about the core-libs-dev mailing list