[External] : Sequenced Collections

forax at univ-mlv.fr forax at univ-mlv.fr
Fri Feb 18 08:09:54 UTC 2022


----- Original Message -----
> From: "Stuart Marks" <stuart.marks at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "core-libs-dev" <core-libs-dev at openjdk.java.net>, "Tagir Valeev" <amaembo at gmail.com>
> Sent: Tuesday, February 15, 2022 6:06:54 AM
> Subject: Re: [External] : Sequenced Collections

>> Here is the first sentence of the javadoc for java.util.List "An ordered
>> collection (also known as a sequence)."
>> And the first paragraph of java.util.RandomAccess "Marker interface used by List
>> implementations to indicate that they support fast (generally constant time)
>> random access. The primary purpose of this interface is to allow generic
>> algorithms to alter their behavior to provide good performance when applied to
>> either random or sequential access lists"
>> 
>> You can find that the actual design, mixing ordered collection and indexed
>> collection into one interface named List not great, but you can not say that
>> this is not the actual design.
> 
> Hi Rémi,
> 
> You have a talent for omitting pieces of the specification that are inconvenient
> to
> your argument. The complete first paragraph of the List specification is
> 
>     An ordered collection (also known as a sequence). The user of this interface
>     has precise control over where in the list each element is inserted. The user
>     can access elements by their integer index (position in the list), and search
>     for elements in the list.
> 
> Clearly, a List is not *merely* an ordered collection or sequence. Positioning
> (which I refer to as external ordering) and access by index are also inherent to
> List.

I don't disagree, my point is that java.util.List is both an ordered collection or sequence and an indexed collection.
So from the POV of an implementer it's both, but from the POV of a user, if you want only an ordered collection or sequence, you will use List.

Introducing SequencedCollection may be ok in term of implementation, but it will cause trouble in term of usage,
an alarmist view is to think that introducing SequencedCollection will cause chaos, but i think it will simple than that, people will quickly learn that as OrderedSet, NavigableSet or Queue, SequencedCollection is a second class citizen interface and should rarely used in the signature of public methods.

> 
> The original design does support "random access" and "sequential" (linked)
> lists.
> However, 20 years of experience with LinkedList, and with alternative algorithms
> that check the RandomAccess marker interface, have shown that this doesn't work
> very
> well. It would be a bad idea to extend that design by making LinkedHashSet
> implement
> List or for it to provide a List view.

Yes, you right about that, but it does not mean that SequencedCollection will solve anything.

> 
> SortedSet is internally ordered and is therefore semantically incompatible with
> the
> external ordering of List. This incompatibility exists regardless of whether
> SortedSet implements List directly or merely provides a List view.

If it's a view, you now that a view keeps the semantics of the original collection, this is how a view works, so i agree that a SortedSet can not implement List directly, but it do not think it's an issue if SortedSet provides a view.

> 
> In object design you can always take a sledgehammer to something and pound on it
> until it kind of looks like what you want. Indeed, you could do that with List
> in
> the way that you suggest, but the result will be confusing to work with and
> burdensome to implement. So, no, I'm not going to do that.

I don't think there is a good solution here,
I'm wary about the fact that you want to fix a 20 year old design bug by trying to shoehorn SequencedCollection with the hope that it will fix that design error,
my experience is that this kind of refactoring does more harm than good.
And we can not use List because we dpo not want to introduce new sequential implementations of List.  

And you did not answer about providing a bidirectional iterator of the fact that LinkedHashMap does not implement SequencedMap correctly, as an example
the following code may throw an AssertionError with a SequencedSet created from a LinkedHashMap constructed with true as last argument.

  void brokenInvariant(SequencedSet<String> set) {
    var first = set.getFirst();    
    set.contains("foo");
    assertEquals(first, set.getFirst());
  }

> 
> s'marks

Rémi

> 
> 
> 
> 
> On 2/13/22 9:40 AM, forax at univ-mlv.fr wrote:
>> 
>> 
>> ------------------------------------------------------------------------------------
>> 
>>     *From: *"Tagir Valeev" <amaembo at gmail.com>
>>     *To: *"Stuart Marks" <stuart.marks at oracle.com>
>>     *Cc: *"Remi Forax" <forax at univ-mlv.fr>, "core-libs-dev"
>>     <core-libs-dev at openjdk.java.net>
>>     *Sent: *Saturday, February 12, 2022 4:24:24 AM
>>     *Subject: *Re: [External] : Sequenced Collections
>> 
>>     Wow, I missed that the Sequenced Collections JEP draft was posted!
>>     Of course, I strongly support this initiative and am happy that my proposal got
>>     some love and is moving forward. In general, I like the JEP in the way it is. I
>>     have only two slight concerns:
>>     1. I'm not sure that having addition methods (addFirst, addLast, putFirst,
>>     putLast) is a good idea, as not every mutable implementation may support them.
>>     While this adds some API unification, it's quite a rare case when this could be
>>     necessary. I think most real world applications of Sequenced* types would be
>>     around querying, or maybe draining (so removal operations are ok). Probably it
>>     would be enough to add addFirst, addLast, putFirst, putLast directly to the
>>     compatible implementations/subinterfaces like List, LinkedHashSet, and
>>     LinkedHashMap removing them from the Sequenced* interfaces. In this case,
>>     SortedSet interface will not be polluted with operations that can never be
>>     implemented. Well my opinion is not very strong here.
>> 
>>     2. SequencedCollection name is a little bit too long. I think every extra letter
>>     adds a hesitation for users to use the type, especially in APIs where it could
>>     be the most useful. I see the Naming section and must admit that I don't have
>>     better ideas. Well, maybe just Sequenced would work? Or too vague?
>> 
>>     Speaking of Remi's suggestion, I don't think it's a good idea. Maybe it could be
>>     if we designed the Collection API from scratch.
>> 
>> 
>> ??
>> Here is the first sentence of the javadoc for java.util.List "An ordered
>> collection
>> (also known as a /sequence/)."
>> And the first paragraph of java.util.RandomAccess "Marker interface used by
>> |List|
>> implementations to indicate that they support fast (generally constant time)
>> random
>> access. The primary purpose of this interface is to allow generic algorithms to
>> alter their behavior to provide good performance when applied to either random
>> or
>> sequential access lists"
>> 
>> You can find that the actual design, mixing ordered collection and indexed
>> collection into one interface named List not great, but you can not say that
>> this is
>> not the actual design.
>> 
>>     But given the current state of Java collections, it's better to add new
>>     interfaces than to put the new semantics to the java.util.List and greatly
>>     increase the amount of non-random-accessed lists in the wild.
>>     There are tons of code that implicitly assume fast random access of every
>>     incoming list (using indexed iteration inside). The suggested approach could
>>     become a performance disaster.
>> 
>> 
>> If you take several Java developers, some will stick to the javadoc definition,
>> a
>> List is either sequential or random access and some will think that a List is
>> only
>> random access. Because of that, adding more sequential implementations under the
>> List interface is an issue.
>> 
>> Introducing SequencesCollection (more on the name later), a super interface of
>> List
>> solves that issue, the new implementations will only implement the sequential
>> part
>> of interface List.
>> But it does not solve the other problems, mainly adding 4 interfaces when one is
>> enough, not being backward compatible because of inference and the weird
>> semantics
>> of LinkedHashMap.
>> 
>> We still need SortedSet or LinkedHashSet to not directly implement
>> SequencesCollection but to use delegation and a have a method returning a view.
>> The
>> same reasoning applied to SortedMap, LinkedHashMap.
>> By using views, there is no need to the two other proposed interfaces
>> SequenceSet
>> and SequenceMap.
>> 
>> Another question is ListIterator, a list can be iterated forward and backward, a
>> SequenceCollection can do almost the same thing, with iterator() and
>> reversed().iterator(). It's not exactly the same semantics but i don't think it
>> exist an implementation of SequenceCollection that can be implemented without
>> the
>> property that given one element, it's predecessor and successor can be found in
>> O(1).
>> Do we need a new SequenceCollectionIterator that provides the method
>> next/hasNext/previous/hasPrevious but not add/set/nextIndex/previousIndex ?
>> 
>> For the name, Java uses single simple name of one syllable for the important
>> interface List, Set, Map or Deque (the javadoc of Deque insist that Deque should
>> be
>> pronounced using one syllable).
>> So the name should be Seq.
>> The main issue with the name "Seq" is that it is perhaps too close to the name
>> "Set".
>> Also, it can not be "Sequence" because of CharSequence.
>> 
>> interface Seq<E> extends Collection<E> {
>>     void addFirst();
>>     void addLast();
>>     E getFirst();
>>     E getLast();
>>     E removeFirst();
>>     E removeLast();
>>     Seq<E> reversed();
>> }
>> 
>> interface List<E> extends Seq<E> { }
>> 
>> interface SortedSet<E> implements Set<E> {  // or NavigableSet
>>     // new methods
>>     Seq<E> asSeq();
>> }
>> 
>> interface SortedMap<K,V> implements Map<K,V> {  // or NavigableMap
>>     // new methods
>>     Seq<K> keySeq();  // do not use covariant return type
>>     Seq<V> valueSeq();
>>     Seq<Map.Entry<K,V>> entrySeq();
>> }
>> 
>> I'm still not sure that introducing an interface like Seq instead of using List
>> is
>> the way to go.
>> If we do that, there will be a lot of blog post/bikeshedding about when to use
>> List
>> vs Seq and a lot of github issues about taking a Seq instead of a List as
>> parameter
>> of a method of a library.
>> 
>> 
>>     With best regards,
>>     Tagir Valeev.
>> 
>> 
>> Rémi
>> 
>> 
>>     On Sat, Feb 12, 2022 at 2:26 AM Stuart Marks <stuart.marks at oracle.com
>>     <mailto:stuart.marks at oracle.com>> wrote:
>> 
>>         Hi Rémi,
>> 
>>         I see that you're trying to reduce the number of interfaces introduced by
>>         unifying
>>         things around an existing interface, List. Yes, it's true that List is an
>>         ordered
>>         collection. However, your analysis conveniently omits other facts about List
>>         that
>>         make it unsuitable as a general "ordered collection" interface. Specifically:
>> 
>>         1) List supports element access by int index; and
>> 
>>         2) List is externally ordered. That is, its ordering is determined by a
>>         succession
>>         of API calls, irrespective of element values. This is in contrast to
>>         SortedSet et al
>>         which are internally ordered, in that the ordering is determined by the
>>         element values.
>> 
>>         The problem with indexed element access is that it creates a bunch of hidden
>>         performance pitfalls for any data structure where element access is other
>>         than O(1).
>>         So get(i) degrades to O(n), binarySearch degrades from O(log n) to O(n).
>>         (This is in
>>         the sequential implementation; the random access implementation degrades to
>>         O(n log
>>         n)). Apparently innocuous indexed for-loops degrade to quadratic. This is
>>         one of the
>>         reasons why LinkedList is a bad List implementation.
>> 
>>         If we refactor LinkedHashSet to implement List, we basically have created
>>         another
>>         situation just like LinkedList. That's a step in the wrong direction.
>> 
>>         Turning to internal ordering (SortedSet): it's fundamentally incompatible with
>>         List's external ordering. List has a lot of positional mutation operations
>>         such as
>>         add(i, obj); after this call, you expect obj to appear at position i. That
>>         can't
>>         work with a SortedSet.
>> 
>>         There is implicit positioning semantics in other methods that don't have index
>>         arguments. For example, replaceAll replaces each element of a List with the
>>         result
>>         of calling a function on that element. Crucially, the function result goes
>>         into the
>>         same location as the original element. That to cannot work with SortedSet.
>> 
>>         Well, we can try to deal with these issues somehow, like making certain methods
>>         throw UnsupportedOperationException, or by relaxing the semantics of the
>>         methods so
>>         that they no longer have the same element positioning semantics. Either of
>>         these
>>         approaches contorts the List interface to such an extent that it's no longer
>>         a List.
>> 
>>         So, no, it's not useful or effective to try to make List be the common "ordered
>>         collection" interface.
>> 
>>         s'marks
>> 
>> 
>> 
>>         On 2/10/22 3:14 AM, Remi Forax wrote:
>>          > I've read the draft of the JEP on sequenced collection, and i think the
>>         proposed design can be improved.
>>          > https://bugs.openjdk.java.net/browse/JDK-8280836
>>         <https://bugs.openjdk.java.net/browse/JDK-8280836>
>>          >
>>          > I agree with the motivation, there is a need for an API to consider the
>>         element of a list, a sorted set and a linked hash set as an ordered sequence
>>         of elements with a simple way to access/add/remove the first/last element
>>         and also reverse the elements as view.
>>          >
>>          > I disagree about the conclusion that we need to introduce 4 new
>>         interfaces for that matter.
>>          >
>>          > Here are the reasons
>>          > 1/ Usually an ordered collection is called a list. Introducing an
>>         interface SequencedCollection for something which is usually called a list
>>         will cause more harm than good. Or maybe we should rename LISP to SEQP :)
>>          >
>>          > 2/ There is already an interface List in Java, that represents an ordered
>>         sequence of elements, with LinkedList being the name of the the double
>>         linked list implementation. You can argue that there is a slight difference
>>         between the semantics of java.util.List and the proposed syntax of
>>         java.util.SequencedCollection, but given that people already have
>>         difficulties to understand basic data structure concepts, as a teacher i
>>         dread to have a discussion on those slight differences that are only true in
>>         Java.
>>          >
>>          > If the collection API was not already existing, we may discuss about
>>         having the same interface java.util.List to both indexed collection and
>>         ordered collection, but that boat has sailed a long time ago.
>>          >
>>          > So in first approach, we should refactor sorted set and linked hash set
>>         to directly implement java.util.List and all the proposed methods into
>>         java.util.List. But as you hint in the Risks and Assumptions section, this
>>         will cause regression due to inference and also we will have trouble with
>>         LinkedHashMap (see below).
>>          >
>>          > 3/ LinkedHashMap mixes 3 implementations in one class, some of these
>>         implementations does not conform to the semantics of SequencedMap.
>>          > - You can opt-out having the key sequentially ordered as defined by
>>         SequencedMap by using the constructor LinkedHashMap(int initialCapacity,
>>         float loadFactor, boolean accessOrder) and passing true as last parameter.
>>          > - You can opt-out having the key sequentially ordered as defined by
>>         SequencedMap by overriding removeEldestEntry(), removing the first entry at
>>         the same time you add a new one.
>>          >
>>          > Because all these reasons, i think we should move to another design,
>>         using delegation instead of inheritance, which for the collection framework
>>         means exposing new way to access/modify sorted set and linked hash set
>>         through java.util.List views.
>>          >
>>          > The concept of views is not a new concept, it's used in Arrays.asList(),
>>         List.subList() or Map.keySet()/values()/entrySet() (and more). The idea is
>>         not that a sorted set is a list but that it provides a method to see it as a
>>         list. It solves our problem of compatibility by not adding super types to
>>         existing type and also the problem of the semantics of LinkedHashMap because
>>         a view keeps the semantics of the data structure it originated.
>>          >
>>          > Here is the proposed new methods in List, SortedSet and SortedMap.
>>          >
>>          > interface List<E> extends Collection<E> {
>>          >    // new methods
>>          >    void addFirst();
>>          >    void addLast();
>>          >    E getFirst();
>>          >    E getLast();
>>          >    E removeFirst();
>>          >    E removeLast();
>>          >    List<E> reversedList(); // or descendingList() ??
>>          > }
>>          >
>>          > interface SortedSet<E> implements Set<E> {
>>          >    // new methods
>>          >    List<E> asList();
>>          > }
>>          >
>>          > interface SortedMap<K,V> implements Map<K,V> {
>>          >    // new methods
>>          >    List<K> keyList();  // do not use covariant return type
>>          >    List<Map.Entry<K,V>> entryList();  // same
>>          > }
>>          >
>>          > I believe this design is objectively better than the one proposed because
>>         as a user being able to use new interfaces is a slow process, the
>>         libraries/dependencies must be updated to take the new interfaces as
>>         parameter before the new types can be used. By contrast, the proposed design
>>         only enhance existing interfaces so people will enjoy the new methods
>>         directly when introduced.
>>          >
>>          > Rémi
>>          >
>> 


More information about the core-libs-dev mailing list