[External] : Sequenced Collections

forax at univ-mlv.fr forax at univ-mlv.fr
Fri Feb 11 21:51:27 UTC 2022


----- Original Message -----
> From: "Stuart Marks" <stuart.marks at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "core-libs-dev" <core-libs-dev at openjdk.java.net>
> Sent: Friday, February 11, 2022 8:25:19 PM
> Subject: Re: [External] : Sequenced Collections

> Hi Rémi,
> 
> I see that you're trying to reduce the number of interfaces introduced by
> unifying
> things around an existing interface, List. Yes, it's true that List is an
> ordered
> collection. However, your analysis conveniently omits other facts about List
> that
> make it unsuitable as a general "ordered collection" interface. Specifically:
> 
> 1) List supports element access by int index; and
> 
> 2) List is externally ordered. That is, its ordering is determined by a
> succession
> of API calls, irrespective of element values. This is in contrast to SortedSet
> et al
> which are internally ordered, in that the ordering is determined by the element
> values.
> 
> The problem with indexed element access is that it creates a bunch of hidden
> performance pitfalls for any data structure where element access is other than
> O(1).
> So get(i) degrades to O(n), binarySearch degrades from O(log n) to O(n). (This
> is in
> the sequential implementation; the random access implementation degrades to O(n
> log
> n)). Apparently innocuous indexed for-loops degrade to quadratic. This is one of
> the
> reasons why LinkedList is a bad List implementation.
> 
> If we refactor LinkedHashSet to implement List, we basically have created
> another
> situation just like LinkedList. That's a step in the wrong direction.


Everybody is focused on LinkedList but that's not the problem, the problem is that, unlike every other languages, in Java, doing an indexed loop on a List is a bad idea.
There are plenty of other implementations of AbstractSequentialList, other than LinkedList, that exhibits the same bad performance when used in an indexed loop.
A simple search on github find a lot of classes that implements AbstractSequentialList
  https://github.com/search?l=Java&p=6&q=%22extends+AbstractSequentialList%22+-luni+-LinkedList&type=Code

In fact, the problem of performance is not something inherent to the List API, it's true for any interface of the collection API,
By example, AbstractSet.contains() is linear, CopyOnWriteArrayList/CopyOnWriteArraySet.add() is linear etc.

The whole idea of using interfaces in the collection API is at the same time
- great because you have more interropt
- awful because you have usually no idea of the complexity of the method you call.

BTW, if we want to be serious and solve the issue of indexed Loop on List,
we should not have stop half way with the enhanced for loop, and also provides a way to get an index for an element when looping
And obviously, there is also a need for a compiler warning to explain that doing an indexed loop on a List is bad.

> 
> Turning to internal ordering (SortedSet): it's fundamentally incompatible with
> List's external ordering. List has a lot of positional mutation operations such
> as
> add(i, obj); after this call, you expect obj to appear at position i. That can't
> work with a SortedSet.

I don't propose that SortedSet implements List, i propose to add a new method asList() to SortedSet that return a view of the sorted set as a List.
Obviously, like the Set returned by Map.keySet() does not implement all the methods of Set, calling add() throws an UnsupportedOperationException,
the method List.add(int,E) will throw an UnsupportedOperationException when called on the result of SortedSet.asList()

> 
> There is implicit positioning semantics in other methods that don't have index
> arguments. For example, replaceAll replaces each element of a List with the
> result
> of calling a function on that element. Crucially, the function result goes into
> the
> same location as the original element. That to cannot work with SortedSet.

Not with SortedSet, like above, replace() is an optional method, it will not be implemented by the List returned by SortedSet.asList(),
but LinkedHashSet.asList() may return a List that implements replace(). 

> 
> Well, we can try to deal with these issues somehow, like making certain methods
> throw UnsupportedOperationException, or by relaxing the semantics of the methods
> so
> that they no longer have the same element positioning semantics. Either of these
> approaches contorts the List interface to such an extent that it's no longer a
> List.

As said above, it's a list view, like by example the List returned by Arrays.asList() does not implement add(E)/add(int,E) or remove(int)/remove(E),
the view of the List does not have to implement the methods that do a mutation using an index.

> 
> So, no, it's not useful or effective to try to make List be the common "ordered
> collection" interface.

It is because this is how people are using List actually. It's the default type which is used to say, the elements are ordered by insertion order.
You may think that this is wrong, but i think it's better to be more pragmatic here. Trying to change how people think given that this way of
thinking is already enshrined in existing codes is futile.

Here are some examples

Let suppose we have a parking containing cars, a straightforward implementation is
  class Parking {
    private final ArrayList<Car> cars = new ArrayList<>();

    public void add(Car car) {
      cars.add(car);
    }

    public Optional<Car> findCarByLicencePlate(String plate) {
      return cars.stream().filter(...).findFirst();
    }

    public List<Car> cars() {
      return Collections.unmodifiableList(cars);
    }
  }

After some perf analysis, you find that findCarByLicencePlate() is used a lot and should be optimized,
the easiest refactoring is to use a hash map, exactly a linked hash map to keep the order correct.
Here being able to get the list of values as a view of the linked hash map becomes very handy.

  class Parking {
    private final LinkedHashMap<String, Car> carMap = new LinkedHashMap<>();

    public void add(Car car) {
      carMap.put(car.getLicencePlate(), car);
    }

    public Optional<Car> findCarByLicencePlate(String plate) {
      return Optional.ofNullable(carMap.get(plate));
    }

    public List<Car> cars() {
      return Collections.unmodifiableList(carMap.valueList());
    }
  }
  

Another example,
I've an array of values and i want a list of the element without the duplicates, one can write
  List<String> list = Arrays.stream(array).distinct().toList();
but it creates a Set (used internally by distinct) and a list.

With this new API, you can use the view
  List<String> list = new LinkedHashSet<>(Arrays.asList(array)).asList();


> 
> s'marks

Rémi

> 
> 
> 
> On 2/10/22 3:14 AM, Remi Forax wrote:
>> I've read the draft of the JEP on sequenced collection, and i think the proposed
>> design can be improved.
>>    https://bugs.openjdk.java.net/browse/JDK-8280836
>> 
>> I agree with the motivation, there is a need for an API to consider the element
>> of a list, a sorted set and a linked hash set as an ordered sequence of
>> elements with a simple way to access/add/remove the first/last element and also
>> reverse the elements as view.
>> 
>> I disagree about the conclusion that we need to introduce 4 new interfaces for
>> that matter.
>> 
>> Here are the reasons
>> 1/ Usually an ordered collection is called a list. Introducing an interface
>> SequencedCollection for something which is usually called a list will cause
>> more harm than good. Or maybe we should rename LISP to SEQP :)
>> 
>> 2/ There is already an interface List in Java, that represents an ordered
>> sequence of elements, with LinkedList being the name of the the double linked
>> list implementation. You can argue that there is a slight difference between
>> the semantics of java.util.List and the proposed syntax of
>> java.util.SequencedCollection, but given that people already have difficulties
>> to understand basic data structure concepts, as a teacher i dread to have a
>> discussion on those slight differences that are only true in Java.
>> 
>> If the collection API was not already existing, we may discuss about having the
>> same interface java.util.List to both indexed collection and ordered
>> collection, but that boat has sailed a long time ago.
>> 
>> So in first approach, we should refactor sorted set and linked hash set to
>> directly implement java.util.List and all the proposed methods into
>> java.util.List. But as you hint in the Risks and Assumptions section, this will
>> cause regression due to inference and also we will have trouble with
>> LinkedHashMap (see below).
>> 
>> 3/ LinkedHashMap mixes 3 implementations in one class, some of these
>> implementations does not conform to the semantics of SequencedMap.
>> - You can opt-out having the key sequentially ordered as defined by SequencedMap
>> by using the constructor LinkedHashMap(int initialCapacity, float loadFactor,
>> boolean accessOrder) and passing true as last parameter.
>> - You can opt-out having the key sequentially ordered as defined by SequencedMap
>> by overriding removeEldestEntry(), removing the first entry at the same time
>> you add a new one.
>> 
>> Because all these reasons, i think we should move to another design, using
>> delegation instead of inheritance, which for the collection framework means
>> exposing new way to access/modify sorted set and linked hash set through
>> java.util.List views.
>> 
>> The concept of views is not a new concept, it's used in Arrays.asList(),
>> List.subList() or Map.keySet()/values()/entrySet() (and more). The idea is not
>> that a sorted set is a list but that it provides a method to see it as a list.
>> It solves our problem of compatibility by not adding super types to existing
>> type and also the problem of the semantics of LinkedHashMap because a view
>> keeps the semantics of the data structure it originated.
>> 
>> Here is the proposed new methods in List, SortedSet and SortedMap.
>> 
>> interface List<E> extends Collection<E> {
>>    // new methods
>>    void addFirst();
>>    void addLast();
>>    E getFirst();
>>    E getLast();
>>    E removeFirst();
>>    E removeLast();
>>    List<E> reversedList(); // or descendingList() ??
>> }
>> 
>> interface SortedSet<E> implements Set<E> {
>>    // new methods
>>    List<E> asList();
>> }
>> 
>> interface SortedMap<K,V> implements Map<K,V> {
>>    // new methods
>>    List<K> keyList();  // do not use covariant return type
>>    List<Map.Entry<K,V>> entryList();  // same
>> }
>> 
>> I believe this design is objectively better than the one proposed because as a
>> user being able to use new interfaces is a slow process, the
>> libraries/dependencies must be updated to take the new interfaces as parameter
>> before the new types can be used. By contrast, the proposed design only enhance
>> existing interfaces so people will enjoy the new methods directly when
>> introduced.
>> 
>> Rémi


More information about the core-libs-dev mailing list