RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map) Hi all, Here's an updated webrev for this changeset; the previous review thread is here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h... This webrev includes the following: * specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views" * new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods * new Collectors.toUnmodifiableList, Set, and Map methods * tests for the new API methods I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method. I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved. (The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.) I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's. Webrev: http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/ Bugs: https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map Thanks, s'marks
In "View Collections" Most collections contain elements themselves. By contrast, <i>view collections</i> 112 * themselves do not contain elements, but instead rely on a backing collection to 113 * store the actual elements. This should be: "most collections manage the storage for the elements they contain. By contrast, ...". (All collections contain their elements -- just ask the `contains()` method!) Similarly, in: Correspondingly, any changes made to 126 * the view collection are written through to the backing collection. you might want to make the connection that views might "handle" the operation by prohibiting it; view collections need not support changes if they don't want to. In <b>Unmodifiable view collections.</b> You might want to say explicitly "such as Collections.unmodifiable{List,Set,...}" when describing unmodifiable view collections. THat's of course what you're getting at, but it could be said more explicitly. Returns an <a href="#unmodifiable">unmodifiable List</a> containing the elements 1045 * of the given Collection, in iteration order. In the _given collection's_ iteration order. Returns an <a href="#unmodifiable">unmodifiable Map</a> containing the entries 1664 * of the given Map. the given Map must not be null, and it must not contain any 1665 * null keys or values. If the given Map is subsequently modified, the returned 1666 * Map will not reflect such modifications. Second sentence starts with lower case later. I might quibble with "subsequently modified", as there is some ambiguity (what if the map is concurrently modified, subsequent to making the copyOf() call, but prior to the return from copyOf()?) But I won't ;) In 296 Collector<T, ?, List<T>> toUnmodifiableList() { 297 return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, 298 (left, right) -> { left.addAll(right); return left; }, 299 list -> (List<T>)List.of(list.toArray()), 300 CH_NOID); why the intermediate array, and not just use copyOf()? (Is it because you added copyOf later and didn't go back and update this?) Same for other collectors. On 10/30/2017 6:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Thanks for the review. Editorial comments accepted. Discussion and responses to some items below. On 10/31/17 7:25 AM, Brian Goetz wrote:
Returns an <a href="#unmodifiable">unmodifiable Map</a> containing the entries 1664 * of the given Map. the given Map must not be null, and it must not contain any 1665 * null keys or values. If the given Map is subsequently modified, the returned 1666 * Map will not reflect such modifications.
I might quibble with "subsequently modified", as there is some ambiguity (what if the map is concurrently modified, subsequent to making the copyOf() call, but prior to the return from copyOf()?) But I won't ;)
Well I actually did think about this, but I couldn't think of anything better to say. If concurrent modification is permitted, it's conceivable that a modification made any time during the copy might end up in the returned map. It depends on the source map's implementation and concurrent modification policy. So I think the only statement that can be made is about the state of the returned map after copyOf() returns. That's what "subsequent" means to me, but I could clarify this if you think it's worth it. (Similar reasoning for List and Set.)
296 Collector<T, ?, List<T>> toUnmodifiableList() { 297 return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, 298 (left, right) -> { left.addAll(right); return left; }, 299 list -> (List<T>)List.of(list.toArray()), 300 CH_NOID); why the intermediate array, and not just use copyOf()? (Is it because you added copyOf later and didn't go back and update this?) Same for other collectors.
The copyOf method is mostly forced to make a defensive copy, so it's no better. I think what should happen here eventually is that the elements should be collected into something like a SpinedBuffer, and then a private interface should be plumbed into the unmodifiable list implementation that can copy elements directly from the buffer. This avoids the extra copy. But I'll take care of that later. For toUnmodifiableSet and toUnmodifiableMap, I can imagine similar private interfaces to avoid excess copies, but I think they each still need to build up full intermediate HashSet/HashMap in order to get rid of duplicates. s'marks
On 10/30/2017 6:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Hi Stuart, Collection.java: The mix of "should" and "must" in Collection.java can be confusing. Typically "should" is interpreted as "must" from a conformance point of view. For example, 147 * <p>An <i>unmodifiable collection</i> is a collection, all of whose 148 * mutator methods (as defined above) are specified to throw 149 * {@code UnsupportedOperationException}. Such a collection thus cannot be 150 * modified by calling any methods on it. For a collection to be properly 151 * unmodifiable, any view collections derived from it *must *also be unmodifiable. 152 * For example, if a List is unmodifiable, the List returned by 153 * {@link List#subList List.subList} *should *also be unmodifiable. That leaves room for a non-compliant collection which could not technically be called "unmodifiable". 138 * does not support the operation. Such methods *should *(but are not required 139 * to) throw an {@code UnsupportedOperationException} if the invocation would 140 * have no effect on the collection. For example, invoking the 141 * {@link #addAll addAll} method on a collection that does not support 142 * the {@link #add add} operation *should *throw the exception if 143 * the collection passed as an argument is empty. These two sentences both allow not throwing UOE and requiring UOE. Can they be worded to be consistent (or similarly qualified.) Set.java: 706: Would using "unique elements" clarify the contents of the result? Saying "Duplicate elements are permitted," may be misleading. Perhaps "The given Collection may contain duplicate elements"... Similarly in Collectors.to toUnmodifiableSet: "Duplicate elements are allowed," can be misleading, especially the word allowed. Collectors.toUnmodifiableMap * If the mapped keys 1474 * might have duplicates, use {@link #toUnmodifiableMap(Function, Function, BinaryOperator)} 1475 * instead. It would helpful to say why to use the version with the BinaryOperator, for example to say" "To determine which of the duplicate elements to retain". or similar. Collectors:1606: impl note: With three different NPE throws, it is helpful to use the 2-arg form of requireNonNull to indicate which parameter offends. Roger On 10/30/2017 6:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
On 10/31/17 11:24 AM, Roger Riggs wrote:
Hi Stuart,
Collection.java:
The mix of "should" and "must" in Collection.java can be confusing. Typically "should" is interpreted as "must" from a conformance point of view.
"Must" is a requirement, and "should" is a recommendation to implementors, and advice to clients that some property often but doesn't necessarily hold.
For example,
147 * <p>An <i>unmodifiable collection</i> is a collection, all of whose 148 * mutator methods (as defined above) are specified to throw 149 * {@code UnsupportedOperationException}. Such a collection thus cannot be 150 * modified by calling any methods on it. For a collection to be properly 151 * unmodifiable, any view collections derived from it *must *also be unmodifiable. 152 * For example, if a List is unmodifiable, the List returned by 153 * {@link List#subList List.subList} *should *also be unmodifiable.
That leaves room for a non-compliant collection which could not technically be called "unmodifiable".
I agree that "should" is poor here; I'll reword this to say that the returned list "is also unmodifiable."
138 * does not support the operation. Such methods *should *(but are not required 139 * to) throw an {@code UnsupportedOperationException} if the invocation would 140 * have no effect on the collection. For example, invoking the 141 * {@link #addAll addAll} method on a collection that does not support 142 * the {@link #add add} operation *should *throw the exception if 143 * the collection passed as an argument is empty.
These two sentences both allow not throwing UOE and requiring UOE. Can they be worded to be consistent (or similarly qualified.)
For this case, implementations are recommended to throw UOE, but they are not required to do so. Some implementations will throw, while others will not throw. There are examples of both in the JDK. I'll clarify the example, but I really do mean "should" here.
Set.java: 706: Would using "unique elements" clarify the contents of the result? Saying "Duplicate elements are permitted," may be misleading. Perhaps "The given Collection may contain duplicate elements"...
I'll clarify this to say that if the given Collection contains duplicate elements, an arbitrary element of the duplicates is preserved.
Similarly in Collectors.to toUnmodifiableSet: "Duplicate elements are allowed," can be misleading, especially the word allowed.
Will update similarly.
Collectors.toUnmodifiableMap * If the mapped keys 1474 * might have duplicates, use {@link #toUnmodifiableMap(Function, Function, BinaryOperator)} 1475 * instead.
It would helpful to say why to use the version with the BinaryOperator, for example to say" "To determine which of the duplicate elements to retain". or similar.
Will add words about handling the merging of values.
Collectors:1606: impl note: With three different NPE throws, it is helpful to use the 2-arg form of requireNonNull to indicate which parameter offends.
Will update. Thanks, s'marks
Roger
On 10/30/2017 6:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Updated webrev, based on comments from Brian and Roger: http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/ s'marks On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Having a List.of(List) copy constructor would save an additional array copy in the collector Which uses (List<T>)List.of(list.toArray()) Gruss Bernd -- http://bernd.eckenfels.net ________________________________ From: core-libs-dev <core-libs-dev-bounces@openjdk.java.net> on behalf of Stuart Marks <stuart.marks@oracle.com> Sent: Wednesday, November 1, 2017 12:49:56 AM To: core-libs-dev Subject: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map Updated webrev, based on comments from Brian and Roger: http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/ s'marks On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
On 10/31/17 5:52 PM, Bernd Eckenfels wrote:
Having a List.of(List) copy constructor would save an additional array copy in the collector Which uses (List<T>)List.of(list.toArray())
The quickest way to get all the elements from the source collection is to call toArray() on it. Some copy constructors (like ArrayList's) simply use the returned array for internal storage. This saves a copy, but it relies on the source collection's toArray() implementation to be correct. In particular, the returned array must be freshly created, and that array must be of type Object[]. If either of these is violated, it will break the ArrayList. The "immutable" collections behind List.of/copyOf/etc. are fastidious about allocating their internal arrays in order to ensure that they are referenced only from within the newly created collection. This requires making an "extra" copy of the array returned by toArray(). An alternative is for the new collection to preallocate its internal array using the source's size(), and then to copy the elements out. But the source's size might change during the copy (e.g., if it's a concurrent collection) so this complicates things. I think the only safe way to avoid the extra copy is to create a private interface that can be used by JDK implementations. s'marks
Am 01.11.2017 um 20:25 schrieb Stuart Marks <stuart.marks@oracle.com>:
On 10/31/17 5:52 PM, Bernd Eckenfels wrote:
Having a List.of(List) copy constructor would save an additional array copy in the collector Which uses (List<T>)List.of(list.toArray())
The quickest way to get all the elements from the source collection is to call toArray() on it. Some copy constructors (like ArrayList's) simply use the returned array for internal storage. This saves a copy, but it relies on the source collection's toArray() implementation to be correct. In particular, the returned array must be freshly created, and that array must be of type Object[]. If either of these is violated, it will break the ArrayList.
The "immutable" collections behind List.of/copyOf/etc. are fastidious about allocating their internal arrays in order to ensure that they are referenced only from within the newly created collection. This requires making an „extra" copy of the array returned by toArray().
An alternative is for the new collection to preallocate its internal array using the source's size(), and then to copy the elements out. But the source’s size might change during the copy (e.g., if it’s a concurrent collection) so this complicates things.
I think the array „overhead“ would be only for the cases of zero, one and two value implementations. That seems to me not worth of optimizing…
I think the only safe way to avoid the extra copy is to create a private interface that can be used by JDK implementations.
s'marks
-Patrick
I disagree, actually. Collections with size zero and one are significantly more common than bigger collections. In Guava's immutable collection factories (ImmutableList.of(...) etc.), we observed a roughly exponential decline in the number of users of factory methods of each size: if N people created empty lists, ~N/2 created singleton lists, ~N/4 created two-element lists, etc. We got noticeable pushback from RAM-sensitive customers when we eliminated some specialized singleton collection implementations. Our experience has been that specializing for N=0 and N=1 does pay off. Probably not N=2, though? On Wed, Nov 1, 2017 at 12:45 PM Patrick Reinhart <patrick@reini.net> wrote:
Am 01.11.2017 um 20:25 schrieb Stuart Marks <stuart.marks@oracle.com>:
On 10/31/17 5:52 PM, Bernd Eckenfels wrote:
Having a List.of(List) copy constructor would save an additional array copy in the collector Which uses (List<T>)List.of(list.toArray())
The quickest way to get all the elements from the source collection is to call toArray() on it. Some copy constructors (like ArrayList's) simply use the returned array for internal storage. This saves a copy, but it relies on the source collection's toArray() implementation to be correct. In particular, the returned array must be freshly created, and that array must be of type Object[]. If either of these is violated, it will break the ArrayList.
The "immutable" collections behind List.of/copyOf/etc. are fastidious about allocating their internal arrays in order to ensure that they are referenced only from within the newly created collection. This requires making an „extra" copy of the array returned by toArray().
An alternative is for the new collection to preallocate its internal array using the source's size(), and then to copy the elements out. But the source’s size might change during the copy (e.g., if it’s a concurrent collection) so this complicates things.
I think the array „overhead“ would be only for the cases of zero, one and two value implementations. That seems to me not worth of optimizing…
I think the only safe way to avoid the extra copy is to create a private interface that can be used by JDK implementations.
s'marks
-Patrick
In this case I would prefer a non static copyOf() method on the list to create a unmodifiable list/set/map, where the optimal factory method can be called. This would also solve the problem of a concurrent implementation. -Patrick
Am 01.11.2017 um 21:05 schrieb Louis Wasserman <lowasser@google.com>:
I disagree, actually. Collections with size zero and one are significantly more common than bigger collections.
In Guava's immutable collection factories (ImmutableList.of(...) etc.), we observed a roughly exponential decline in the number of users of factory methods of each size: if N people created empty lists, ~N/2 created singleton lists, ~N/4 created two-element lists, etc. We got noticeable pushback from RAM-sensitive customers when we eliminated some specialized singleton collection implementations.
Our experience has been that specializing for N=0 and N=1 does pay off. Probably not N=2, though?
On Wed, Nov 1, 2017 at 12:45 PM Patrick Reinhart <patrick@reini.net <mailto:patrick@reini.net>> wrote:
Am 01.11.2017 um 20:25 schrieb Stuart Marks <stuart.marks@oracle.com <mailto:stuart.marks@oracle.com>>:
On 10/31/17 5:52 PM, Bernd Eckenfels wrote:
Having a List.of(List) copy constructor would save an additional array copy in the collector Which uses (List<T>)List.of(list.toArray())
The quickest way to get all the elements from the source collection is to call toArray() on it. Some copy constructors (like ArrayList's) simply use the returned array for internal storage. This saves a copy, but it relies on the source collection's toArray() implementation to be correct. In particular, the returned array must be freshly created, and that array must be of type Object[]. If either of these is violated, it will break the ArrayList.
The "immutable" collections behind List.of/copyOf/etc. are fastidious about allocating their internal arrays in order to ensure that they are referenced only from within the newly created collection. This requires making an „extra" copy of the array returned by toArray().
An alternative is for the new collection to preallocate its internal array using the source's size(), and then to copy the elements out. But the source’s size might change during the copy (e.g., if it’s a concurrent collection) so this complicates things.
I think the array „overhead“ would be only for the cases of zero, one and two value implementations. That seems to me not worth of optimizing…
I think the only safe way to avoid the extra copy is to create a private interface that can be used by JDK implementations.
s'marks
-Patrick
Hi, It would also support a more fluent use of the APIs. Though if we start down that road it may become littered with convenience methods. Something to ponder for a bit. Roger On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
In this case I would prefer a non static copyOf() method on the list to create a unmodifiable list/set/map, where the optimal factory method can be called. This would also solve the problem of a concurrent implementation.
-Patrick
Am 01.11.2017 um 21:05 schrieb Louis Wasserman <lowasser@google.com>:
I disagree, actually. Collections with size zero and one are significantly more common than bigger collections.
In Guava's immutable collection factories (ImmutableList.of(...) etc.), we observed a roughly exponential decline in the number of users of factory methods of each size: if N people created empty lists, ~N/2 created singleton lists, ~N/4 created two-element lists, etc. We got noticeable pushback from RAM-sensitive customers when we eliminated some specialized singleton collection implementations.
Our experience has been that specializing for N=0 and N=1 does pay off. Probably not N=2, though?
On Wed, Nov 1, 2017 at 12:45 PM Patrick Reinhart <patrick@reini.net <mailto:patrick@reini.net>> wrote:
Am 01.11.2017 um 20:25 schrieb Stuart Marks <stuart.marks@oracle.com <mailto:stuart.marks@oracle.com>>:
On 10/31/17 5:52 PM, Bernd Eckenfels wrote:
Having a List.of(List) copy constructor would save an additional array copy in the collector Which uses (List<T>)List.of(list.toArray()) The quickest way to get all the elements from the source collection is to call toArray() on it. Some copy constructors (like ArrayList's) simply use the returned array for internal storage. This saves a copy, but it relies on the source collection's toArray() implementation to be correct. In particular, the returned array must be freshly created, and that array must be of type Object[]. If either of these is violated, it will break the ArrayList.
The "immutable" collections behind List.of/copyOf/etc. are fastidious about allocating their internal arrays in order to ensure that they are referenced only from within the newly created collection. This requires making an „extra" copy of the array returned by toArray().
An alternative is for the new collection to preallocate its internal array using the source's size(), and then to copy the elements out. But the source’s size might change during the copy (e.g., if it’s a concurrent collection) so this complicates things. I think the array „overhead“ would be only for the cases of zero, one and two value implementations. That seems to me not worth of optimizing…
I think the only safe way to avoid the extra copy is to create a private interface that can be used by JDK implementations.
s'marks -Patrick
Am 02.11.2017 um 20:08 schrieb Roger Riggs:
Hi,
It would also support a more fluent use of the APIs. Though if we start down that road it may become littered with convenience methods. Something to ponder for a bit.
One one hand I see your obligation, on the other hand we would open the chance to write more optimal implementations for a library implementers though. That seems to me also an considerable point here. For instance an unmodifiable List could basically return a fast copy of itself here using an internal constructor passing the values from itself or a Set implementation has not to check for duplicate entries or similar. -Patrick
On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
In this case I would prefer a non static copyOf() method on the list to create a unmodifiable list/set/map, where the optimal factory method can be called. This would also solve the problem of a concurrent implementation.
-Patrick
Am 01.11.2017 um 21:05 schrieb Louis Wasserman <lowasser@google.com>:
I disagree, actually. Collections with size zero and one are significantly more common than bigger collections.
In Guava's immutable collection factories (ImmutableList.of(...) etc.), we observed a roughly exponential decline in the number of users of factory methods of each size: if N people created empty lists, ~N/2 created singleton lists, ~N/4 created two-element lists, etc. We got noticeable pushback from RAM-sensitive customers when we eliminated some specialized singleton collection implementations.
Our experience has been that specializing for N=0 and N=1 does pay off. Probably not N=2, though?
On Wed, Nov 1, 2017 at 12:45 PM Patrick Reinhart <patrick@reini.net <mailto:patrick@reini.net>> wrote:
Am 01.11.2017 um 20:25 schrieb Stuart Marks <stuart.marks@oracle.com <mailto:stuart.marks@oracle.com>>:
On 10/31/17 5:52 PM, Bernd Eckenfels wrote:
Having a List.of(List) copy constructor would save an additional array copy in the collector Which uses (List<T>)List.of(list.toArray()) The quickest way to get all the elements from the source collection is to call toArray() on it. Some copy constructors (like ArrayList's) simply use the returned array for internal storage. This saves a copy, but it relies on the source collection's toArray() implementation to be correct. In particular, the returned array must be freshly created, and that array must be of type Object[]. If either of these is violated, it will break the ArrayList.
The "immutable" collections behind List.of/copyOf/etc. are fastidious about allocating their internal arrays in order to ensure that they are referenced only from within the newly created collection. This requires making an „extra" copy of the array returned by toArray().
An alternative is for the new collection to preallocate its internal array using the source's size(), and then to copy the elements out. But the source’s size might change during the copy (e.g., if it’s a concurrent collection) so this complicates things. I think the array „overhead“ would be only for the cases of zero, one and two value implementations. That seems to me not worth of optimizing…
I think the only safe way to avoid the extra copy is to create a private interface that can be used by JDK implementations.
s'marks -Patrick
On Nov 3, 2017, at 1:26 AM, Patrick Reinhart <patrick@reini.net> wrote:
On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
In this case I would prefer a non static copyOf() method on the list to create a unmodifiable list/set/map, where the optimal factory method can be called. This would also solve the problem of a concurrent implementation.
Such methods would, unfortunately, be a disastrous mistake. We are encouraging users to rely on the unmodifiability of the value-based lists, sets, and maps returned by copyOf, the same way they rely on the unmodifiability of Strings. But if we made copyOf be non-static, then an attacker (or just a buggy program) could introduce a broken implementation of List that returned a time-varying List. This would break the contract of copyOf, but it would be impossible to detect. The result would be a less reliable copyOf API, and eventually some TOCTTOU security vulnerabilities. This kind of surprise mutability is red meat for security researchers. So it's got to be a static method, although if List were a class we could also choose to make copyOf be a final method. I said "unfortunately" above because I too would prefer fluent syntax like ls.unmodifiableCopy() or ls.frozen() instead of List.copyOf(ls), since fluent syntax reads and composes cleanly from left to right. I think a good way to get what we want may be to introduce a way for selected methods of an interface to mark themselves (somehow) as "fluent statics", meaning that they can be invoked with their first argument in receiver position, before the dot. In effect, they would act like "final default" methods but without having to damage interfaces with final methods. I don't have a syntax in mind for defining a "fluent static", but I have thought for some time that allowing such sugar for interface statics is the best way to adapt the notion of class final methods to interfaces. This is not easy to do, since it is a language change, and only applies to a small (though influential) number of methods, those which should *not* be overridable but *should* (for some reason of API design) support virtual (fluent, postfix, left-to-right) notation. For more details please see JDK-8191530 and if you think of a good use case for these thingies, let me know so I can add it to the write-up. — John
Hi John, i strongly believe that static fluent methods have no place in a blue collar language, we can do better, in fact the have already done better by introducing default method instead of extension methods like in C# or Kotlin*. The main issue with this kind of method is that it quacks like a duck but it's not a duck, i.e. you call those static methods as virtual methods but there are no virtual methods because you can not override them. Adding extension methods as feature is like a black hole, extension methods allow you to write clunky DSLs so it's a feature that will force the language to move to add more new features to be able to write decent DSLs (think implicit object as an example), and so on. So it sucks your design budget like a black hole. In term of performance, the code inside a static extension methods tends to become megamorphic, in C#, Data Link (Link for Collection API) is slow because of that, in Kotlin you can workaround that by using the keyword inline which will copy the code into the callsite avoiding profile pollution. So in my opinion, the only possible option is to introduce final default methods, i fully understand that this is non trivial to introduce this kind of methods in the VM but this discussion is a good use case for their introduction. regards, Rémi * i will not criticize Kotlin designers on that, they wanted to enhance Java APIs without owning them so they had little options. ----- Mail original -----
De: "John Rose" <john.r.rose@oracle.com> À: "Patrick Reinhart" <patrick@reini.net> Cc: "core-libs-dev" <core-libs-dev@openjdk.java.net> Envoyé: Samedi 18 Novembre 2017 07:40:05 Objet: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map
On Nov 3, 2017, at 1:26 AM, Patrick Reinhart <patrick@reini.net> wrote:
On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
In this case I would prefer a non static copyOf() method on the list to create a unmodifiable list/set/map, where the optimal factory method can be called. This would also solve the problem of a concurrent implementation.
Such methods would, unfortunately, be a disastrous mistake.
We are encouraging users to rely on the unmodifiability of the value-based lists, sets, and maps returned by copyOf, the same way they rely on the unmodifiability of Strings. But if we made copyOf be non-static, then an attacker (or just a buggy program) could introduce a broken implementation of List that returned a time-varying List. This would break the contract of copyOf, but it would be impossible to detect. The result would be a less reliable copyOf API, and eventually some TOCTTOU security vulnerabilities. This kind of surprise mutability is red meat for security researchers.
So it's got to be a static method, although if List were a class we could also choose to make copyOf be a final method.
I said "unfortunately" above because I too would prefer fluent syntax like ls.unmodifiableCopy() or ls.frozen() instead of List.copyOf(ls), since fluent syntax reads and composes cleanly from left to right.
I think a good way to get what we want may be to introduce a way for selected methods of an interface to mark themselves (somehow) as "fluent statics", meaning that they can be invoked with their first argument in receiver position, before the dot. In effect, they would act like "final default" methods but without having to damage interfaces with final methods.
I don't have a syntax in mind for defining a "fluent static", but I have thought for some time that allowing such sugar for interface statics is the best way to adapt the notion of class final methods to interfaces.
This is not easy to do, since it is a language change, and only applies to a small (though influential) number of methods, those which should *not* be overridable but *should* (for some reason of API design) support virtual (fluent, postfix, left-to-right) notation.
For more details please see JDK-8191530 and if you think of a good use case for these thingies, let me know so I can add it to the write-up.
— John
I'm with Remi on this. Sent from my MacBook Wheel
On Nov 18, 2017, at 5:41 AM, Remi Forax <forax@univ-mlv.fr> wrote:
Hi John, i strongly believe that static fluent methods have no place in a blue collar language, we can do better, in fact the have already done better by introducing default method instead of extension methods like in C# or Kotlin*.
The main issue with this kind of method is that it quacks like a duck but it's not a duck, i.e. you call those static methods as virtual methods but there are no virtual methods because you can not override them.
Adding extension methods as feature is like a black hole, extension methods allow you to write clunky DSLs so it's a feature that will force the language to move to add more new features to be able to write decent DSLs (think implicit object as an example), and so on. So it sucks your design budget like a black hole.
In term of performance, the code inside a static extension methods tends to become megamorphic, in C#, Data Link (Link for Collection API) is slow because of that, in Kotlin you can workaround that by using the keyword inline which will copy the code into the callsite avoiding profile pollution.
So in my opinion, the only possible option is to introduce final default methods, i fully understand that this is non trivial to introduce this kind of methods in the VM but this discussion is a good use case for their introduction.
regards, Rémi
* i will not criticize Kotlin designers on that, they wanted to enhance Java APIs without owning them so they had little options.
----- Mail original -----
De: "John Rose" <john.r.rose@oracle.com> À: "Patrick Reinhart" <patrick@reini.net> Cc: "core-libs-dev" <core-libs-dev@openjdk.java.net> Envoyé: Samedi 18 Novembre 2017 07:40:05 Objet: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map
On Nov 3, 2017, at 1:26 AM, Patrick Reinhart <patrick@reini.net> wrote:
On 11/1/2017 4:29 PM, Patrick Reinhart wrote: In this case I would prefer a non static copyOf() method on the list to create a unmodifiable list/set/map, where the optimal factory method can be called. This would also solve the problem of a concurrent implementation.
Such methods would, unfortunately, be a disastrous mistake.
We are encouraging users to rely on the unmodifiability of the value-based lists, sets, and maps returned by copyOf, the same way they rely on the unmodifiability of Strings. But if we made copyOf be non-static, then an attacker (or just a buggy program) could introduce a broken implementation of List that returned a time-varying List. This would break the contract of copyOf, but it would be impossible to detect. The result would be a less reliable copyOf API, and eventually some TOCTTOU security vulnerabilities. This kind of surprise mutability is red meat for security researchers.
So it's got to be a static method, although if List were a class we could also choose to make copyOf be a final method.
I said "unfortunately" above because I too would prefer fluent syntax like ls.unmodifiableCopy() or ls.frozen() instead of List.copyOf(ls), since fluent syntax reads and composes cleanly from left to right.
I think a good way to get what we want may be to introduce a way for selected methods of an interface to mark themselves (somehow) as "fluent statics", meaning that they can be invoked with their first argument in receiver position, before the dot. In effect, they would act like "final default" methods but without having to damage interfaces with final methods.
I don't have a syntax in mind for defining a "fluent static", but I have thought for some time that allowing such sugar for interface statics is the best way to adapt the notion of class final methods to interfaces.
This is not easy to do, since it is a language change, and only applies to a small (though influential) number of methods, those which should *not* be overridable but *should* (for some reason of API design) support virtual (fluent, postfix, left-to-right) notation.
For more details please see JDK-8191530 and if you think of a good use case for these thingies, let me know so I can add it to the write-up.
— John
On Nov 18, 2017, at 11:10 AM, Brian Goetz <brian.goetz@oracle.com> wrote:
I'm with Remi on this.
Which suggestion of his two? Blue-collar language with bad DSLs, or final defaults? — John
Use-site static extension methods = bad. Sent from my MacBook Wheel
On Nov 18, 2017, at 11:12 AM, John Rose <john.r.rose@oracle.com> wrote:
On Nov 18, 2017, at 11:10 AM, Brian Goetz <brian.goetz@oracle.com> wrote:
I'm with Remi on this.
Which suggestion of his two? Blue-collar language with bad DSLs, or final defaults?
— John
On Nov 18, 2017, at 11:16 AM, Brian Goetz <brian.goetz@oracle.com> wrote:
Use-site static extension methods = bad.
Right; that wasn't on the table from either Remy or me. All def-site stuff, so the designer of the defining interface has full control of the DSL. JDK-8191530 is def-site extension methods, a compromise that meets most or all of our requirements. (Side note: We could enable third-party DSL stuff with value types and specialization by using the value object as a view, and the real type of the extended object/value as a generic parameter. The value type acts just like a C++ smart pointer and can warp the original API around however it likes.)
Sent from my MacBook Wheel
On Nov 18, 2017, at 11:12 AM, John Rose <john.r.rose@oracle.com <mailto:john.r.rose@oracle.com>> wrote:
On Nov 18, 2017, at 11:10 AM, Brian Goetz <brian.goetz@oracle.com <mailto:brian.goetz@oracle.com>> wrote:
I'm with Remi on this.
Which suggestion of his two? Blue-collar language with bad DSLs, or final defaults?
— John
On Nov 18, 2017, at 8:10 AM, Brian Goetz <brian.goetz@oracle.com> wrote:
I'm with Remi on this.
Sent from my MacBook Wheel
On Nov 18, 2017, at 5:41 AM, Remi Forax <forax@univ-mlv.fr <mailto:forax@univ-mlv.fr>> wrote:
Hi John, i strongly believe that static fluent methods have no place in a blue collar language ...
So in my opinion, the only possible option is to introduce final default methods, i fully understand that this is non trivial to introduce this kind of methods in the VM but this discussion is a good use case for their introduction.
We have four choices on the table with respect to the occasional need for securable API points in fluent APIs: 0. Blue collar language: Pick only one of fluency or security. 1. Secure a default method by marking it final. 2. Secure a fluent API point by binding to a static in the same type. 4. Extension methods: Anybody can "import" new statics into any type. I think #0 hurts security, which is why I keep objecting to it. Brian thinks #1 puts too many restrictions on implementors. Although it would seem to allow everybody to do whatever they want, #4 is not a fit for Java. APIs in java.base are carefully curated, and allowing third parties to add "nice" methods would interfere with that curation. Option #2 has the known good properties of C# extension methods (decoupling from receiver dispatch, natural notation), without the known bad properties of C# extension methods (lack of curation). So #2 is the least ugly solution for an ugly problem, except possibly #1 Which I would accept also. I would also be glad to see a #5 that would please everybody. — John P.S. Security is a big concern for us developers of java.base. And it is surely a concern for everyone else, at least in part. If you program behind only a firewall, consider that program integrity and robustness are corollaries of security. Your past self and teammates are throwing buggy code at your present self; you want to be secure from that even if you don't fear nefarious attackers. When we secure the Java stack from nefarious attackers we are also preventing large numbers of unintentional bugs.
De: "John Rose" <john.r.rose@oracle.com> À: "Brian Goetz" <brian.goetz@oracle.com>, "Rémi Forax" <forax@univ-mlv.fr> Cc: "core-libs-dev" <core-libs-dev@openjdk.java.net> Envoyé: Mercredi 22 Novembre 2017 04:31:52 Objet: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map
On Nov 18, 2017, at 8:10 AM, Brian Goetz < [ mailto:brian.goetz@oracle.com | brian.goetz@oracle.com ] > wrote:
I'm with Remi on this.
Sent from my MacBook Wheel
On Nov 18, 2017, at 5:41 AM, Remi Forax < [ mailto:forax@univ-mlv.fr | forax@univ-mlv.fr ] > wrote:
Hi John, i strongly believe that static fluent methods have no place in a blue collar language ... So in my opinion, the only possible option is to introduce final default methods, i fully understand that this is non trivial to introduce this kind of methods in the VM but this discussion is a good use case for their introduction.
We have four choices on the table with respect to the occasional need for securable API points in fluent APIs:
0. Blue collar language: Pick only one of fluency or security.
1. Secure a default method by marking it final.
2. Secure a fluent API point by binding to a static in the same type.
4. Extension methods: Anybody can "import" new statics into any type.
I think #0 hurts security, which is why I keep objecting to it.
Brian thinks #1 puts too many restrictions on implementors.
The whole point of final on methods is to add security and as a side effect, it bother the implementors. Security always bother people that doesn't have to handle security issues. So you can have security even in a blue collar language. I do not see the point to add a new semantics if you have a way to solve the current issue by applying a know semantics 'final' on an already existing feature 'default method'.
Although it would seem to allow everybody to do whatever they want, #4 is not a fit for Java. APIs in java.base are carefully curated, and allowing third parties to add "nice" methods would interfere with that curation.
Option #2 has the known good properties of C# extension methods (decoupling from receiver dispatch, natural notation), without the known bad properties of C# extension methods (lack of curation).
So #2 is the least ugly solution for an ugly problem, except possibly #1 Which I would accept also.
#2 also works by accident, the fact that by inheritance you can see static method is ugly. The lambda EG has made clear that the fact that you can see a static method by inheritance was a mistake by making static methods in interface non visible by inheritance.
I would also be glad to see a #5 that would please everybody.
or a #3, there is no #3 in your list.
— John
Rémi
P.S. Security is a big concern for us developers of java.base. And it is surely a concern for everyone else, at least in part. If you program behind only a firewall, consider that program integrity and robustness are corollaries of security. Your past self and teammates are throwing buggy code at your present self; you want to be secure from that even if you don't fear nefarious attackers. When we secure the Java stack from nefarious attackers we are also preventing large numbers of unintentional bugs.
I think everybody on this list agree about that point.
Hello! Set.of: + if (coll instanceof ImmutableCollections.AbstractImmutableSet) { + return (Set<E>)coll; + } else { + return (Set<E>)Set.of(coll.stream().distinct().toArray()); I think that good old Set.of(new HashSet<>(coll).toArray()) would produce less garbage. distinct() also maintains HashSet internally, but it removes the SIZED characteristic, so instead of preallocated array you will have a SpinedBuffer which is less efficient than AbstractCollection.toArray() implementation which just allocates the array of exact size. What do you think? Collectors: + static final Set<Collector.Characteristics> CH_UNORDERED_NOID + = Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.UNORDERED)); Is it really more efficient currently than Set.of(Collector.Characteristics.UNORDERED)? At least less objects will be allocated with Set.of + Collector<T, ?, List<T>> toUnmodifiableList() { + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + (left, right) -> { left.addAll(right); return left; }, + list -> (List<T>)List.of(list.toArray()), + CH_NOID); + } Isn't it reasonable to use `e -> List.add(Objects.requireNonNull(e))` instead of simply `List::add`? In this case if null is added, then failure will occur much earlier, and the failure stacktrace would be more relevant. The same for Set/Map. + map -> (Map<K,U>)Map.ofEntries(map.entrySet().toArray(new Map.Entry[0]))); It's the same lambda in two versions of toUnmodifiableMap. Isn't it better to extract it to the constant to prevent duplication in the bytecode (or at least to method and refer to it via method reference)? With best regards, Tagir Valeev. With best regards, Tagir Valeev. On Wed, Nov 1, 2017 at 12:49 AM, Stuart Marks <stuart.marks@oracle.com> wrote:
Updated webrev, based on comments from Brian and Roger:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/
s'marks
On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
On 11/1/17 10:45 AM, Tagir Valeev wrote:
Set.of:
+ if (coll instanceof ImmutableCollections.AbstractImmutableSet) { + return (Set<E>)coll; + } else { + return (Set<E>)Set.of(coll.stream().distinct().toArray());
I think that good old Set.of(new HashSet<>(coll).toArray()) would produce less garbage. distinct() also maintains HashSet internally, but it removes the SIZED characteristic, so instead of preallocated array you will have a SpinedBuffer which is less efficient than AbstractCollection.toArray() implementation which just allocates the array of exact size. What do you think?
Oh yes, good point. I had initially used stream().distinct() because I wanted to use distinct()'s semantics of preserving the first equal element among duplicates. But since I removed that requirement from the spec, using a HashSet as you suggest is much simpler.
Collectors:
+ static final Set<Collector.Characteristics> CH_UNORDERED_NOID + = Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.UNORDERED));
Is it really more efficient currently than Set.of(Collector.Characteristics.UNORDERED)? At least less objects will be allocated with Set.of
Maybe. I'm just following the same pattern as the nearby characteristics. They're singletons, so this hardly makes any difference. What's really needed is an unmodifiable EnumSet, which I hope to get to at some point.
+ Collector<T, ?, List<T>> toUnmodifiableList() { + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + (left, right) -> { left.addAll(right); return left; }, + list -> (List<T>)List.of(list.toArray()), + CH_NOID); + }
Isn't it reasonable to use `e -> List.add(Objects.requireNonNull(e))` instead of simply `List::add`? In this case if null is added, then failure will occur much earlier, and the failure stacktrace would be more relevant. The same for Set/Map.
Interesting. Initally I thought this would be a good idea, but then I tried it out. With the implementation in my latest webrev, the stack trace is this: jshell> Arrays.asList(1, 2, null, 4).stream().collect(Collectors.toUnmodifiableList()) | java.lang.NullPointerException thrown | at Objects.requireNonNull (Objects.java:221) | at ImmutableCollections$ListN.<init> (ImmutableCollections.java:234) | at List.of (List.java:1039) | at Collectors.lambda$toUnmodifiableList$6 (Collectors.java:299) | at ReferencePipeline.collect (ReferencePipeline.java:515) | at (#2:1) With the lambda and requireNonNull(), the stack trace is this: jshell> Arrays.asList(1, 2, null, 4).stream().collect(Collectors.toUnmodifiableList()) | java.lang.NullPointerException thrown | at Objects.requireNonNull (Objects.java:221) | at Collectors.lambda$toUnmodifiableList$5 (Collectors.java:298) | at ReduceOps$3ReducingSink.accept (ReduceOps.java:169) | at Spliterators$ArraySpliterator.forEachRemaining (Spliterators.java:948) | at AbstractPipeline.copyInto (AbstractPipeline.java:484) | at AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:474) | at ReduceOps$ReduceOp.evaluateSequential (ReduceOps.java:913) | at AbstractPipeline.evaluate (AbstractPipeline.java:234) | at ReferencePipeline.collect (ReferencePipeline.java:511) | at (#2:1) It's true that with lambda+requireNonNull, the NPE will be thrown earlier in processing. But the stack trace isn't any clearer (the only user code is at the very bottom at location "#2:1") and it's still within the context of the same call to the stream's terminal collect() call. So, doesn't seem like it makes much difference.
+ map -> (Map<K,U>)Map.ofEntries(map.entrySet().toArray(new Map.Entry[0])));
It's the same lambda in two versions of toUnmodifiableMap. Isn't it better to extract it to the constant to prevent duplication in the bytecode (or at least to method and refer to it via method reference)?
It might be better to do that, but given that this is the code I want to replace with a private interface, I don't think it's worth fiddling around with at this point. Thanks, s'marks
With best regards, Tagir Valeev.
With best regards, Tagir Valeev.
On Wed, Nov 1, 2017 at 12:49 AM, Stuart Marks <stuart.marks@oracle.com> wrote:
Updated webrev, based on comments from Brian and Roger:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/
s'marks
On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Hi, On 11/02/2017 12:51 AM, Stuart Marks wrote:
On 11/1/17 10:45 AM, Tagir Valeev wrote:
Set.of:
+ if (coll instanceof ImmutableCollections.AbstractImmutableSet) { + return (Set<E>)coll; + } else { + return (Set<E>)Set.of(coll.stream().distinct().toArray());
I think that good old Set.of(new HashSet<>(coll).toArray()) would produce less garbage. distinct() also maintains HashSet internally, but it removes the SIZED characteristic, so instead of preallocated array you will have a SpinedBuffer which is less efficient than AbstractCollection.toArray() implementation which just allocates the array of exact size. What do you think?
Oh yes, good point. I had initially used stream().distinct() because I wanted to use distinct()'s semantics of preserving the first equal element among duplicates. But since I removed that requirement from the spec, using a HashSet as you suggest is much simpler.
Why not using: coll.stream().collect(Collectors.toImmutableSet()) As Collectors.toImmutableSet() is currently implemented, with serial Stream it will create a single HashSet, add all the elements to it and call Set.of(HashSet.toArray()) with it. Pretty much the same as what Tagir proposes, but the Collector could be made more efficient in the future and with it, the optimization would automatically extend to Set.copyOf()... Regards, Peter
Why not using:
coll.stream().collect(Collectors.toImmutableSet())
As Collectors.toImmutableSet() is currently implemented, with serial Stream it will create a single HashSet, add all the elements to it and call Set.of(HashSet.toArray()) with it. Pretty much the same as what Tagir proposes, but the Collector could be made more efficient in the future and with it, the optimization would automatically extend to Set.copyOf()... This is mainly about whether Set.copyOf() is implemented in terms of Collectors.toUnmodifiableSet(), or vice-versa, which then calls Set.of(T[]) to do the actual creation. Some future optimization will probably replace both of these implementations with calls to JDK internal methods that can bypass the extra copying, so it doesn't really matter which one of these calls the other right now.
s'marks
Hi Stuart, After having thought over your arguments about copyOf versus unmodifiableCopyOf length discussion (also with my colleague) I was thinking about why not create additional X.of(X) methods instead of X.copyOf(X). It would seem to me a logical enhancement in the sense of the existing API though. -Patrick
Am 02.11.2017 um 23:04 schrieb Stuart Marks <stuart.marks@oracle.com>:
Why not using:
coll.stream().collect(Collectors.toImmutableSet())
As Collectors.toImmutableSet() is currently implemented, with serial Stream it will create a single HashSet, add all the elements to it and call Set.of(HashSet.toArray()) with it. Pretty much the same as what Tagir proposes, but the Collector could be made more efficient in the future and with it, the optimization would automatically extend to Set.copyOf()... This is mainly about whether Set.copyOf() is implemented in terms of Collectors.toUnmodifiableSet(), or vice-versa, which then calls Set.of(T[]) to do the actual creation. Some future optimization will probably replace both of these implementations with calls to JDK internal methods that can bypass the extra copying, so it doesn't really matter which one of these calls the other right now.
s'marks
Hi Patrick, On 11/10/2017 09:49 AM, Patrick Reinhart wrote:
Hi Stuart,
After having thought over your arguments about copyOf versus unmodifiableCopyOf length discussion (also with my colleague) I was thinking about why not create additional X.of(X) methods instead of X.copyOf(X). It would seem to me a logical enhancement in the sense of the existing API though.
-Patrick
I'm sure Stuart thought of that, but decided against it because of usual ambiguity problems that occur when two candidate .of() methods apply at the same time, for example: List<String> strings = List.of("a", "b", "c"); List<?> foo = List.of(strings); What did programmer want to end up in foo, List<List<String>> or List<String>? What did javac choose? Answers may differ. Regards, Peter
Am 02.11.2017 um 23:04 schrieb Stuart Marks <stuart.marks@oracle.com>:
Why not using:
coll.stream().collect(Collectors.toImmutableSet())
As Collectors.toImmutableSet() is currently implemented, with serial Stream it will create a single HashSet, add all the elements to it and call Set.of(HashSet.toArray()) with it. Pretty much the same as what Tagir proposes, but the Collector could be made more efficient in the future and with it, the optimization would automatically extend to Set.copyOf()... This is mainly about whether Set.copyOf() is implemented in terms of Collectors.toUnmodifiableSet(), or vice-versa, which then calls Set.of(T[]) to do the actual creation. Some future optimization will probably replace both of these implementations with calls to JDK internal methods that can bypass the extra copying, so it doesn't really matter which one of these calls the other right now.
s'marks
Hi Stuart, A few editorial comments: Collection.java: Lines 110, 133, 166 The bold labels probably want to be on their own lines and not terminated by "." to look like headings (or be headings if the CSS supports them) List.java: Consistency of markup references to unmodifiable List|Set|Map. The List.of constructors put the reference on a separate line, but the copyOf constructor does not. You could probably omit the blank line. (BTW, the copyOf constructor does not always create a copy; I'm not sure if the method name will result in an incorrect assumption but it may be misleading or a spec issue.) The same observations are true for Map and Set constructors. Thanks, Roger On 10/31/2017 7:49 PM, Stuart Marks wrote:
Updated webrev, based on comments from Brian and Roger:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/
s'marks
On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
On 11/1/17 1:50 PM, Roger Riggs wrote:
Collection.java: Lines 110, 133, 166 The bold labels probably want to be on their own lines and not terminated by "." to look like headings (or be headings if the CSS supports them) I'll change these to be actual headings. List.java: Consistency of markup references to unmodifiable List|Set|Map. The List.of constructors put the reference on a separate line, but the copyOf constructor does not. You could probably omit the blank line. Yeah, I should update all of these at some point. Given that there are 30-odd more methods to change, plus I have a pile of other collections doc changes to work on, I think I'll do these in a separate doc pass. (BTW, the copyOf constructor does not always create a copy; I'm not sure if the method name will result in an incorrect assumption but it may be misleading or a spec issue.) Right. I haven't been able to come up with any names that had the right semantics and that didn't also have connotations that were misleading in some other way. I observe that Guava's Immutable collections have copyOf() methods with pretty much the same semantics. The Guava docs do note that their methods try to avoid making a copy if they can, but they explicitly say that the circumstances under which this occurs are unspecified. I've considered adding an @apiNote to this effect, but I haven't been able to convince myself that it would be helpful to do so. It would seem to raise new issues that we're unwilling to answer, such as exactly when is a copy is made vs. when the argument is returned. Better, I think, to have people make a copy whenever they think they need a copy, and have the implementation short-circuit this when it can.
s'marks
The same observations are true for Map and Set constructors.
Thanks, Roger
On 10/31/2017 7:49 PM, Stuart Marks wrote:
Updated webrev, based on comments from Brian and Roger:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/
s'marks
On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Hi Stuart, only a minor nit. In Map.java, the class-level Javadoc anchor id="immutable" doesn't match the href="#unmodifiable" used in the methods. + * <h2><a id="immutable">Unmodifiable Maps</a></h2> vs. + * See <a href="#unmodifiable">Unmodifiable Maps</a> for details. Regards, Stefan 2017-11-01 0:49 GMT+01:00 Stuart Marks <stuart.marks@oracle.com>:
Updated webrev, based on comments from Brian and Roger:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/
s'marks
On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Good catch! Thanks, I'll fix it. s'marks On 11/1/17 4:01 PM, Stefan Zobel wrote:
Hi Stuart,
only a minor nit. In Map.java, the class-level Javadoc anchor id="immutable" doesn't match the href="#unmodifiable" used in the methods.
+ * <h2><a id="immutable">Unmodifiable Maps</a></h2>
vs.
+ * See <a href="#unmodifiable">Unmodifiable Maps</a> for details.
Regards, Stefan
2017-11-01 0:49 GMT+01:00 Stuart Marks <stuart.marks@oracle.com>:
Updated webrev, based on comments from Brian and Roger:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.2/
s'marks
On 10/30/17 3:50 PM, Stuart Marks wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Hi all,
Here's an updated webrev for this changeset; the previous review thread is here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-September/049261.h...
This webrev includes the following:
* specification revisions to provide clearer definitions of "view" collections, "unmodifiable" collections, and "unmodifiable views"
* new List.copyOf(), Set.copyOf(), and Map.copyOf() "copy factory" methods
* new Collectors.toUnmodifiableList, Set, and Map methods
* tests for the new API methods
I've added some assertions that require some independence between the source collection (or map) and the result of the copyOf() method.
I've made a small but significant change to Set.copyOf compared to the previous round. Previously, it specified that the first of any equal elements was preserved. Now, it is explicitly unspecified which of any equals elements is preserved. This is consistent with Set.addAll, Collectors.toSet, and the newly added Collectors.toUnmodifiableSet, none of which specify which of duplicate elements is preserved.
(The outlier here is Stream.distinct, which specifies that the first element of any duplicates is preserved, if the stream is ordered.)
I've also made some minor wording/editorial changes in response to suggestions from David Holmes and Roger Riggs. I've kept the wording changes that give emphasis to "unmodifiable" over "immutable." The term "immutable" is inextricably intertwined with "persistent" when it comes to data structures, and I believe we'll be explaining this forever if Java's "immutable" means something different from everybody else's.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.1/
Bugs:
https://bugs.openjdk.java.net/browse/JDK-8177290 add copy factory methods for unmodifiable List, Set, Map
https://bugs.openjdk.java.net/browse/JDK-8184690 add Collectors for collecting into unmodifiable List, Set, and Map
Thanks,
s'marks
Late to the party, but these lines rub me the wrong way: @return the new {@code List} @return the new {@code Set} @return the new {@code Map} The word "new" is a loaded term, which usually means (or can be easily mistaken to mean) that a new object identity is guaranteed. Thus, "new" shouldn't be used to specify the behavior of value-based classes. Given that that the underlying objects are of VBCs, and that we are encouraging programmers to rely on the efficiency of chained copies, it should say something like this instead: @return a {@code List} containing the same elements as the given collection @return a {@code Set} containing the same elements as the given collection @return a {@code Map} containing the same mappings as the given map (Or even s/return a/return an unmodifiable/.) — John
On 11/17/17 9:43 PM, John Rose wrote:
Late to the party, but these lines rub me the wrong way:
@return the new {@code List} @return the new {@code Set} @return the new {@code Map}
The word "new" is a loaded term, which usually means (or can be easily mistaken to mean) that a new object identity is guaranteed. Thus, "new" shouldn't be used to specify the behavior of value-based classes.
Given that that the underlying objects are of VBCs, and that we are encouraging programmers to rely on the efficiency of chained copies, it should say something like this instead:
@return a {@code List} containing the same elements as the given collection @return a {@code Set} containing the same elements as the given collection @return a {@code Map} containing the same mappings as the given map
(Or even s/return a/return an unmodifiable/.)
OK, per your other message, we'll stick with Collectors.toUnmodifiableList/Set/Map, in the hope that in the future we can work on improving various incarnations of toList() to have more desirable properties. Meanwhile, I agree that "@return the new {@code List}" is wrong and I'll adjust it (respectively, Set and Map). Another point from our discussions is that the copyOf() methods should get an @implNote that says they'll try not to make unnecessary copies. Also included are some markup changes and a small Set.copyOf implementation change suggested upthread. Final(?) webrev: http://cr.openjdk.java.net/~smarks/reviews/8177290/webrev.3/ Specdiff: http://cr.openjdk.java.net/~smarks/reviews/8177290/specdiff.3/overview-summa... s'marks
On Oct 30, 2017, at 6:50 PM, Stuart Marks <stuart.marks@oracle.com> wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Now I'm going to be picky about the names of the Collectors; please bear with me a moment. Consider `toUnmodifiableList` and its two cousins (`toUSet`, `toUMap`). The most natural names `toList` (etc.) are already taken. They mean "do whatever is most useful for the collector, perhaps returning unmodifiable or modifiable results". So the problem here is giving the user the option to say, "no, I really want a value-based-class-style list here; I know you can do this for me". `Collectors.toList` can't ever be upgraded in place, since the programmer is getting a Collector's Choice result, and programmers will be inadvertently relying on whatever that (undocumented) choice happens to be. I think this makes the name `toList` significantly less useful, and I for one will never want to use `toList` again, once `toUnmodifiableList` is available. Meanwhile, `toUnmodifiableList` is so ugly to the ear, eye, and fingers that it will be less used just because of its long spelling. Users should be allowed to code as if unmodifiability is a common practice, not something that must be re-announced every time I make a new temporary list. That's why `copyOf` and not `unmodifiableCopyOf` is the right move, and `toUnmodifiableList` is not. (See also Gafter's blog[1] on making NEW FEATURES look NEW. The NEW UNMODIFIABILITY will soon cease to be new but in its present form it will shout forever like an aging rocker.) To get out of this bind, I suggest picking a shorter name that emphasizes that the result is the _elements_ in the list, not the list object per se. This is a VBC[2] move, one which we need to learn to make more often, especially as value types arrive. So I suggest these names for the collectors: `elements` (for a list), `elementSet`, and `elementMap`. Same signatures, of course. Another approach to this, which recycles the old names but doesn't have the VBC vibe, is to create a second set of factories for list/set/map collectors that look just like the old ones, but behave more predictably. For example, one might try to add methods to `Collector` named `asModifiable` and `asUnmodifiable`, so people can write `toList().asModifiable()`. I don't think this works, nor is it very pretty (looks too "NEW"), although it provides a more principled fix to the Collector's Choice problem. If we go forward with the change as written, I think people will only want to remember the one `toList` and forget about his ugly brother. Only zealots like me will remember to add the extra incantation ("NEW! now Unmodifiable!"). Another, better way to recycle the name `toList` is to provide `Collectors.Modifiable` and `Collectors.Unmodifiable`, and then have users import `Collectors.Unmodifiable.toList`. This at least would move the shouting up to the static imports list and out of real code. But I think `toList` is broken and should be avoided. So I think it should be `elements`, `elementSet`, and `elementMap`. Even if you think that unmodifiability should be mentioned every time a collector collects some stuff, observe that `toUnmodifiableList` is not a strong parallel with the older API point, `Collections.unmodifiableList`. That guy creates a _view_ while `copyOf` and `elements` (my name) create non-view unmodifiable (VBC, shallowly immutable) lists. So `Collectors.toUnmodifiableList` is both ugly and ill-sorted with `copyOf`. — John [1]: http://gafter.blogspot.com/2017/06/making-new-language-features-stand-out.ht... [2]: https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.htm...
On Nov 18, 2017, at 7:34 PM, John Rose <john.r.rose@oracle.com> wrote:
On Oct 30, 2017, at 6:50 PM, Stuart Marks <stuart.marks@oracle.com> wrote:
(also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
Now I'm going to be picky about the names of the Collectors; please bear with me a moment. Consider `toUnmodifiableList` and its two cousins (`toUSet`, `toUMap`).
The most natural names `toList` (etc.) are already taken. They mean "do whatever is most useful for the collector, perhaps returning unmodifiable or modifiable results".
Let me adjust my position here, FTR. I am now aware (thanks Brian) that `toList` is not broken but intentionally under-specified, pending future changes. It is my personal hope that the future changes will specify that the result of `toList` is safely publishable and an unmodifiable non-view. This issue is tracked as JDK-8180352. (Logically possible alternatives would seem to include an otherwise unconstrained mutable list, an ArrayList, or a continuation of the "Chef's Choice" policy in effect today. I suspect we could find advocates for all of those positions, as evidenced by comments on JDK-8180352. I just added mine for the record.) If at some point in the future `toList` does produce the same kind of safe list as `List.of`, then I won't have anything to be picky about. Other folks can use `toCollection(ArrayList::new)` or some other explicit op-in for mutability. For now, a security-conscious user like me can work with `Collections.toUnmodifiableList`. That API point will (I hope) have a short career, ending when `toList` does something at least as useful. If at some point in the future `toList` switches to guarantee the mutable ArrayList (as it seems to supply today), then the hope for a simple and safe `toList` will be dashed, and we will have to look for something else that is explicitly oriented towards value based classes, such as `Collectors.valueList` and/or `Stream.values`. Value-based classes are an important "attractor" for API design, because they can be simultaneously safe and performant, compared to Java arrays and ArrayList. (The performance comes from the elimination of copies under the hood, as well as structure flattening, optimizations which potential modification makes impossible. The safety comes from reduction of difficult-to-find TOCTTOU attack surfaces, as beyond the usual claimed benefits of "FP style".) When we get to Valhalla value types, there will be many more value-based classes in the world, since a value type cannot be anything other than a value-based class. Whatever we do with `toList` in the future, it should take into account value-based usages. They are here and more are coming. One more point: When programming with value-based classes, the container is almost irrelevant, and the contents are the whole story. Thus, API points which return value-based multiple values should probably not mention the container type (List) unless there is a real danger of ambiguity. If I have a tree node type and I want to use a value-based List to encode its children, it should look like this: List<Node> children() { return this.children; } or this: List<Node> children() { return List.copyOf(this.children); } not these: List<Node> childrenList() ... List<Node> listOfChildren() ... List<Node> childrenUnmodifiableList() ... private List<Node> secretChildrenArray() ... (etc.) An API in the value-based style could say, in a class header or package intro page, that collection values are value-based if not otherwise specified, and then simply use unmodifiable building blocks everywhere. In any case, I think the de-emphasis of container identity enabled by value-based types provides a subtle but strong push on API design, towards plural nouns, and away from explicit discussion of container details. — John P.S. Ten years ago I designed the MethodType API, which talks about a single return type and multiple parameter types. The corresponding API points are returnType and (not parameterTypes but) parameterList *and* parameterArray. Because at the time arrays and lists were both heavily used carriers for multiple values, we had to focus on the box types (List, Array). Also we didn't have VBC rules yet. In the future I hope similar API designs can (a) ignore legacy arrays, and (b) just use VBC lists, and then (c) be a little less noisy about how the data is carried around. P.P.S. Fully retiring arrays will require dislodging them from their favored position with varargs and similar spots in the stack.
On 19 November 2017 at 03:34, John Rose <john.r.rose@oracle.com> wrote:
Meanwhile, `toUnmodifiableList` is so ugly to the ear, eye, and fingers that it will be less used just because of its long spelling.
Many developers are already using Google Guava which has added toImmutableList() and friends. More widely, we use the toXxx() form for all the collectors we've written (in OpenGamma Strata). As such, I think the toXxx() naming is well established at this point, and toUnmodifiableList() will be fine. Stephen
I think i prefer toImmutableList() than toUnmodifiableList() because the List is truly immutable and not an unmodifiable proxy in front of a mutable List (like Collections.unmodifiableList() does). Rémi ----- Mail original -----
De: "Stephen Colebourne" <scolebourne@joda.org> À: "core-libs-dev" <core-libs-dev@openjdk.java.net> Envoyé: Mercredi 22 Novembre 2017 16:42:56 Objet: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map
On 19 November 2017 at 03:34, John Rose <john.r.rose@oracle.com> wrote:
Meanwhile, `toUnmodifiableList` is so ugly to the ear, eye, and fingers that it will be less used just because of its long spelling.
Many developers are already using Google Guava which has added toImmutableList() and friends. More widely, we use the toXxx() form for all the collectors we've written (in OpenGamma Strata). As such, I think the toXxx() naming is well established at this point, and toUnmodifiableList() will be fine.
Stephen
On 11/22/17 8:45 AM, Remi Forax wrote:
I think i prefer toImmutableList() than toUnmodifiableList() because the List is truly immutable and not an unmodifiable proxy in front of a mutable List (like Collections.unmodifiableList() does).
Immutability is like wine. If you put a spoonful of wine in a barrel full of sewage, you get sewage. If you put a spoonful of sewage in a barrel full of wine, you get sewage. (Schopenhauer's Law of Entropy) Similarly, if you add a little immutability to something mutable, you get mutability. And if you add a little mutability to something immutable, you get mutability. To get the desired properties of immutability (e.g., thread-safety, or the ability to hand around references safely without making defensive copies), it's insufficient for a collection to be "immutable"; you also have to make some statements about the contents. It's thus more precise to say that a collection is unmodifiable, and to provide a clear definition of an unmodifiable collection. Although unmodifiable collections have been around since the collections framework was introduced, oddly enough the concept has never had a good definition. The current proposal includes definitions for unmodifiability, unmodifiable collections, and unmodifiable views, and it clearly specifies which APIs return unmodifiable views and which return unmodifiable collections. s'marks
Guava's "immutable collections" are very popular https://github.com/google/guava/wiki/ImmutableCollectionsExplained and it's not a good idea to fight their advertised notion of "immutable". No generic container class can promise s'marks-style immutability (until valhalla perhaps?) so it's not that useful a concept in this domain. We like to think of Optional as an immutable value based class but you can't stop anyone who really wants to mutate the contained element from creating an Optional<AtomicReference<Object>> We could do a better job of clarifying consequences of element mutation. E.g. do we ever say that hash based collections/maps are broken if elements are ever mutated in such a way that their hash code changes while they are in the collection/map?
On 11/29/17 10:09 AM, Martin Buchholz wrote:
Guava's "immutable collections" are very popular https://github.com/google/guava/wiki/ImmutableCollectionsExplained and it's not a good idea to fight their advertised notion of "immutable".
No generic container class can promise s'marks-style immutability (until valhalla perhaps?) so it's not that useful a concept in this domain. We like to think of Optional as an immutable value based class but you can't stop anyone who really wants to mutate the contained element from creating an Optional<AtomicReference<Object>>
Sure, you can't prevent anybody from doing something like this. The big problem with "immutable" is that to many people it seems to imply what I'd call "persistent". That is, modification operations will modify and return a new data structure, preserving the old one (unmodified), often with structural sharing. A web search for "immutable data structure" matches this. Java is the outlier in trying to define "immutable" as meaning "throws UnsupportedOperationException" on modification. I really think "unmodifiable" is the better word here.
We could do a better job of clarifying consequences of element mutation. E.g. do we ever say that hash based collections/maps are broken if elements are ever mutated in such a way that their hash code changes while they are in the collection/map?
Not exactly, but Set has the following in the class doc:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
(Map has something similar.) There isn't anything in SortedSet/SortedMap that talks about mutation vs. its effect on the Comparator-imposed total ordering. Perhaps there should be. s'marks
OK ! you convince me. Rémi ----- Mail original -----
De: "Stuart Marks" <stuart.marks@oracle.com> À: "Remi Forax" <forax@univ-mlv.fr> Cc: "core-libs-dev" <core-libs-dev@openjdk.java.net> Envoyé: Mercredi 29 Novembre 2017 06:21:41 Objet: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map
On 11/22/17 8:45 AM, Remi Forax wrote:
I think i prefer toImmutableList() than toUnmodifiableList() because the List is truly immutable and not an unmodifiable proxy in front of a mutable List (like Collections.unmodifiableList() does).
Immutability is like wine.
If you put a spoonful of wine in a barrel full of sewage, you get sewage. If you put a spoonful of sewage in a barrel full of wine, you get sewage. (Schopenhauer's Law of Entropy)
Similarly, if you add a little immutability to something mutable, you get mutability. And if you add a little mutability to something immutable, you get mutability.
To get the desired properties of immutability (e.g., thread-safety, or the ability to hand around references safely without making defensive copies), it's insufficient for a collection to be "immutable"; you also have to make some statements about the contents. It's thus more precise to say that a collection is unmodifiable, and to provide a clear definition of an unmodifiable collection.
Although unmodifiable collections have been around since the collections framework was introduced, oddly enough the concept has never had a good definition. The current proposal includes definitions for unmodifiability, unmodifiable collections, and unmodifiable views, and it clearly specifies which APIs return unmodifiable views and which return unmodifiable collections.
s'marks
participants (14)
-
Bernd Eckenfels
-
Brian Goetz
-
forax@univ-mlv.fr
-
John Rose
-
Louis Wasserman
-
Martin Buchholz
-
Patrick Reinhart
-
Peter Levart
-
Remi Forax
-
Roger Riggs
-
Stefan Zobel
-
Stephen Colebourne
-
Stuart Marks
-
Tagir Valeev