ReversibleCollection proposal
Tagir Valeev
amaembo at gmail.com
Sun Apr 18 01:33:36 UTC 2021
> Adding a REVERSIBLE characteristic to spliterators is easy enough
Actually not. There are already tons of delegating spliterators in the
wild, and many of them delegate characteristics in one or another way.
Usually, it's like 'copy all the source characteristics except A and
B' (i.e. `source.characteristics() & (~(A|B))`), so if we introduce a
new one, some existing spliterators may start reporting it without
actually providing the claimed functionality. As the author of the
library that contains dozens of custom spliterator, I think, this is
the largest problem. I don't buy an infinite streams argument, as
`sorted()` is equally not applicable to infinite streams, yet it's
possible to call it (btw reversed() after sorted() could be optimized,
even if the original source is not reversible). Also, in fact,
infinite streams are not widely used. I think, 99% of real-life
streams start from collection, array, or integral range. Yes, my idea
was to fall back to drain-line implementation.
> But, Stuart's proposal gives us much of this with less fuss. IF a collection is reversible, you can just say
>
> c.reversed().stream()
>
> and off you go, plus a similar method for arrays (Arrays.reversedStream). Given that, what is the value of pushing this further into stream?
Arrays.reversedStream should be added for Object/int/long/double and
probably for array slices. Also, IntStream.reversedRange, please! And
LongStream.reversedRange. And probably `reversedRangeClosed`. So we
already have 12 methods to add (aside from collections). Adding single
.reversed() to the Stream is a much smaller API surface. Though I
admit that it requires much more work. On the other hand, I believe
(efforts/usefulness) for this feature is much bigger than, say, for
parallel streams.
In StreamEx, I have
public static <T> StreamEx<T> ofReversed(List<? extends T> list)
public static <T> StreamEx<T> ofReversed(T[] array)
(did not bother to add IntStreamEx ofReversed(int[] array) and friends)
the List version just assumes that every list is random access and
does something like IntStream.range(0, size).mapToObj(idx ->
list.get(size - idx - 1))
I've found 16 usages of them in IntelliJ IDEA sources (10 for List and
6 for array), which is not very small, given the fact that only a few
of our developers in our project know/use StreamEx.
I also have three-arg IntStreamEx.range(int startInclusive, int
endExclusive, int step) (and rangeClosed, and the corresponding
LongStreamEx), which can be used for reverse range (with step = -1)
I've found a couple of usages with step = -1 to iterate over a slice
of an array in reverse order. Though step = 2 is more popular.
I also checked the uses of Collections::reverse. There are hundreds of
them. Often it's traversal in tree-like structure through the parent
chain from leaf to root, like Stream.iterate(leaf, Objects::nonNull,
Node::getParent),
then the result is dumped into the list (probably with some
intermediate ops like map(Node::getName)), then reversed to get "from
root to leaf" order. Sometimes the resulting list is just returned (so
it's possible to create
Collectors.toReversedList() to avoid extra reversal step), sometimes
it's converted to an array and returned, sometimes it's
iterated/streamed again. In one case, it's even followed by
left-to-right reduction,
so foldRight is what was actually wanted there.
One more case is a simple parser of the Java stack trace:
String[] lines = stack.split("\n");
List<JavaMethodCall> calls =
StreamEx.of(lines).filter(s -> s.startsWith("\tat")).map(method ->
getMethodCall(method, ...)).nonNull().toList();
Collections.reverse(calls);
Here's an example of a stream where reversibility is preserved during
the whole pipeline. It's possible to rewrite it using
StreamEx.ofReversed() (probably the code author was not aware of this
method). But it also could be stack.lines().reversed() (and it's
possible to make String::lines reversible without extra buffering!),
so with the dedicated reversed() method it could be completely lazy.
I also see patterns like fill LinkedHashSet, dump it to the List, then
reverse the list and iterate from it. Clearly, it can benefit from the
Stuart's proposal.
With best regards,
Tagir Valeev.
>
> On 4/17/2021 7:49 AM, Tagir Valeev wrote:
>
> Great proposal, thanks!
>
> It has most functionality from my previous proposal, except the
> ability to iterate LinkedHashSet starting from a specific element.
> Well, probably we can skip this for now.
>
> Some people really want to reverse Streams as well. E. g., the
> corresponding StackOverflow question [1] has 171 upvotes and 200k+
> views. Also, if the stream were reversible, it would be possible to
> implement efficient foldRight/reduceRight operation. See the
> discussion at [2]. It would be nice to explore the possibility of
> extending Stream API to take into account the source reversibility.
> Some existing ordered sources that have no underlying collection (e.g.
> IntStream.range, or Arrays.stream) can be easily reversed as well. Map
> and filter operations preserve reversibility and flatMap is reversible
> if the supplied stream is reversible as well.
>
> With best regards,
> Tagir Valeev.
>
> [1] https://stackoverflow.com/q/24010109/4856258
> [2] https://bugs.openjdk.java.net/browse/JDK-8133680
>
> On Sat, Apr 17, 2021 at 12:41 AM Stuart Marks <stuart.marks at oracle.com> wrote:
>
> This is a proposal to add a ReversibleCollection interface to the Collections
> Framework. I'm looking for comments on overall design before I work on detailed
> specifications and tests. Please send such comments as replies on this email thread.
>
> Here's a link to a draft PR that contains the code diffs. It's prototype quality,
> but it should be good enough to build and try out:
>
> https://github.com/openjdk/jdk/pull/3533
>
> And here's a link to a class diagram showing the proposed additions:
>
>
> https://cr.openjdk.java.net/~smarks/ReversibleCollection/ReversibleCollectionDiagram.pdf
>
> Thanks,
>
> s'marks
>
>
> # Ordering and Reversibility
>
>
> A long-standing concept that's been missing from collections is that of the
> positioning, sequencing, or arrangement of elements as a structural property of a
> collection. (This is sometimes called the "iteration order" of a collection.) For
> example, a HashSet is not ordered, but a List is. This concept is mostly not
> manifested in the collections API.
>
> Iterating a collection produces elements one after another in *some* sequence. The
> concept of "ordered" determines whether this sequence is defined or whether it's a
> coincidence of implementation. What does "having an order" mean? It implies that
> there is a first element and that each element has a successor. Since collections
> have a finite number of elements, it further implies that there is a last element
> that has no successor. However, it is difficult to discern whether a collection has
> a defined order. HashSet generally iterates its elements in the same undefined
> order, and you can't actually tell that it's not a defined order.
>
> Streams do have a notion of ordering ("encounter order") and this is preserved,
> where appropriate, through the stream pipeline. It's possible to detect this by
> testing whether its spliterator has the ORDERED characteristic. Any collection with
> a defined order will have a spliterator with this characteristic. However, this is
> quite a roundabout way to determine whether a collection has a defined order.
> Furthermore, knowing this doesn't enable any additional operations. It only provides
> constraints on the stream's implementations (keeping the elements in order) and
> provides stronger semantics for certain operations. For example, findFirst() on an
> unordered stream is the same as findAny(), but actually finds the first element if
> the stream is ordered.
>
> The concept of ordering is thus present in the system but is surfaced only in a
> fairly indirect way. We can strengthen abstraction of ordering by making a few
> observations and considering their implications.
>
> Given that there is a first element and a last element, the sequence of elements has
> two ends. It's reasonable to consider operations (add, get, remove) on either end.
> Indeed, the Deque interface has a full complement of operations at each end. This is
> an oft-requested feature on various other collections.
>
> Each element except for the last has a successor, implying that each element except
> for the first has a predecessor. Thus it's reasonable to consider iterating the
> elements from first to last or from last to first (that is, in forward or reverse
> order). Indeed, the concept of iterating in reverse order appears already in bits
> and pieces in particular places around the collections:
>
> - List has indexOf() and lastIndexOf()
> - Deque has removeFirstOccurrence() and removeLastOccurrence()
> - List has a ListIterator with hasPrevious/previous methods
> - Deque and NavigableSet have descendingIterator methods
>
> Given an ordered collection, though, there's no general way to iterate it in reverse
> order. Reversed iteration isn't the most common operation, but there are some
> frequent use cases, such as operating on elements in most-recently-added order.
> Questions and bug reports about this have come up repeatedly over the years.
>
> Unfortunately, iterating in reverse order is much harder than iterating in forward
> order. There are a variety of ways to iterate in forward order. For example, given a
> List, one can do any of the following:
>
> for (var e : list) { ... }
> list.forEach(...)
> list.stream()
> list.toArray()
>
> However, to iterate a list in reverse order, one must use an explicit loop over
> ListIterator:
>
> for (var it = list.listIterator(list.size()); it.hasPrevious(); ) {
> var e = it.previous();
> ...
> }
>
> Streaming the elements of a List in reverse order is even worse. One approach would
> be to implement a reverse-ordered Iterator that wraps a ListIterator and delegates
> hasNext/next calls to the ListIterator's hasPrevious/previous methods. Then, this
> Iterator can be turned into a Spliterator, which is then turned into a Stream. (The
> code to do this is left as an exercise for the reader.) Obtaining the elements in
> reverse via other means -- forEach() or toArray() -- is similarly difficult.
>
> Obtaining the elements in reverse order can be accomplished by adopting a concept
> from NavigableSet's descendingSet() method, which provides a reverse-ordered view
> collection. This view is also a NavigableSet, which means that any operation that
> can be performed on the original set can also be applied to the reverse-ordered
> view. If it were possible to obtain a similar reverse-ordered view on any kind of
> ordered collection, this would enable the elements to be processed in reverse order
> in any fashion, not just special-purposes APIs such as ListIterator or
> descendingIterator().
>
>
> # LinkedHashSet
>
>
> The main feature of LinkedHashSet is that it does have a defined ordering, as
> opposed to HashSet, which does not. LinkedHashSet clearly has a first and a last
> element. However, LinkedHashSet is deficient in a number of ways:
>
> - access to and removal of the first element is reasonable (using an iterator) but
> it is not possible to add at the front
>
> - it is possible to add an element at the end, but access to and removal of the
> last element are expensive
>
> - it's not possible to iterate in reverse without copying the entire collection
>
> Most frustratingly, LinkedHashSet is implemented using a doubly-linked list, so
> there is no fundamental implementation reason that prevents these operations from
> being supported. The main reason these operations aren't supported is probably that
> there hasn't been a good place to push such APIs.
>
>
> # Proposal
>
>
> Introduce a new ReversibleCollection<E> interface. This provides a new method to
> obtain a reverse-ordered view. It also contains several methods (copied from Deque)
> that allow operating on elements at both ends.
>
>
> interface ReversibleCollection<E> extends Collection<E> {
> ReversibleCollection<E> reversed();
> default void addFirst(E e)
> default void addLast(E e)
> default E getFirst()
> default E getLast()
> default E removeFirst()
> default E removeLast()
> }
>
>
> The List, Deque, and SortedSet interfaces, and the LinkedHashSet class are
> retrofitted to implement ReversibleCollection. They provide covariant overriding
> implementations of the reversed() method. For example, List.reversed() returns a
> List. Default implementations for all of these methods are provided in the
> appropriate interfaces.
>
> Covariant overrides are also provided in several other classes. This presents a
> difficulty for LinkedHashSet, as there's no suitable return type for reversed(). To
> remedy this, we add another interface
>
>
> interface ReversibleSet<E> extends ReversibleCollection<E>, Set<E> { }
>
>
> This adds no new methods but is essentially an intersection of ReversibleCollection
> and Set. As such, it's suitable as the return type of LinkedHashSet.reversed() and
> the set views of LinkedHashMap.
>
> SortedSet::addFirst and addLast throw UnsupportedOperationException. This is because
> SortedSet's ordering is determined by the comparison method and cannot be controlled
> explicitly.
>
> LinkedHashSet::addFirst and addLast add the element at the appropriate position or
> reposition the element if it is already present in the set.
>
> New methods are added to LinkedHashMap
>
> V putFirst(K, V)
> V putLast(K, V)
>
> which put the mapping at the designated position or reposition the mapping if is
> already present in the map.
>
>
> # Design & Implementation Issues
>
>
> Covariant overrides for reversed() are provided where possible. However, there is a
> conflict between List and Deque, as there are examples in the JDK (such as
> LinkedList) that implement both List and Deque. Since List is much more popular than
> Deque, I've decided against adding a covariant override to Deque. Instead, a method
> Deque::reversedDeque is added that returns a Deque, and Deque::reversed returns
> ReversibleCollection.
>
> There is no ReversibleMap interface. Most iteration over maps is over the entrySet,
> keySet, or values views. NavigableMap already has descendingMap(), and LinkedHashMap
> provides ReversibleX views, which cover most of the cases already.
>
> Introducing new methods into an interface hierarchy always raises the possibility of
> name clashes. I've done some searching but I haven't found any major issues, but we
> should be on the lookout for this. I'll continue to do more searching for such
> conflicts.
>
> Introducing covariant overrides on existing methods (notably LinkedHashMap entrySet,
> keySet, and values) could present incompatibility issues with subclasses that
> override these methods. I've done some searching and again I haven't found major
> problems, but this is nonetheless a point of concern. I'll do more searching here, too.
>
> The default implementations for List::reversed, Deque::reversed, and
> SortedSet::reversed return reverse-ordered views that are implemented using only
> public methods of the original interface. This demonstrates the feasibility of
> retrofitting reverse ordering onto any instance of these interfaces. Similarly, the
> various add/get/remove methods' default implementations are all implementable in
> terms of an iterator or reverse-ordered iterator. That said, some concurrent
> collection implementations will need to override these default implementations in
> order to preserve their safety invariants. It would probably also be wise to add
> optimized implementations to the more commonly-used collection implementations.
>
> Given that a ReversibleCollection has a defined ordering, any spliterator obtained
> from such a collection should have the Spliterator.ORDERED characteristic.
>
> This proposal is related to a previous discussion on a similar topic, where Tagir
> Valeev had proposed OrderedSet and OrderedMap:
>
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-April/066028.html
>
> I think there were some good ideas in there and in the ensuing discussion, but it
> didn't go anywhere. Several of the concepts in this proposal are an evolution of
> some ideas discussed in that thread.
>
> One of the ideas from that thread was to use Deque as a common interface for many
> collections to support reversibility and operations at the ends. We've abandoned
> that idea in this proposal. The reasons are that it's cluttered with a bunch of
> methods that are inherited from Queue. Also, some methods are null-returning instead
> of throwing, which introduces confusion for collections that can contain nulls.
>
> # END
>
>
More information about the core-libs-dev
mailing list