Sequenced Collections

Ernie Rael errael at raelity.com
Mon Sep 26 18:31:34 UTC 2022


A SequencedCollection listIterator() is missed; it's useful for 
List.equals, or to implement a List in general. LinkedHashSet seems 
incomplete without it. Even something like "ListIterator 
Collections.forwardListIterator(int index, iterator)", if compatible 
with equals() of the various list types, would be handy. Doesn't forward 
only work for sublist?. Only traversal, optional modification. 
Modification is tricky; an add might have an implicit remove for a Set. 
If not part of SequencedCollection, maybe "interface 
ListIteratorProvider { ListIterator listIterator(int index); }" to tag 
as needed.

This message is primarily about "definition of change and equals" and 
"pluggable eviction policy"; they're discussed through a simple MRU 
example implemented with a SequencedSet and a capacity limit. The 
'changed' concept comes from Collection<>.

// MRU implementation recommendations:
//      - implement List<> for equals
//      - modifications only permitted through direct MRU methods
public interface MRU<E> extends Collection<E> {
     boolean addItem(E key);        // return changed
     boolean removeItem(E key);     // remove(E) contract
     boolean setCapacity(int size); // return changed
     int getCapacity();
     void clear();
     boolean addAll(Collection< ? extends E> c);
     public static <E> MRU<E> synchronizedMRU(MRU<E> mru) {...}
}

To check out SequencedCollection I sketched out an implementation of 
MRU, see below. Currently I have two implementations based on LinkedList 
and LinkedHashSet, each has some ugly stuff. The implementation based on 
SequencedSet below is clean. All the MRU methods are easily implemented 
without consideration of the actual implementation, but AddAll() is 
problematic. removeItem() is included to allow external EvictionPolicy.

Is there a preferred direction for performance? I'm guessing the only 
difference might be going through a "reversed()" view.

Definition of "change" and "equals()" in a SequencedCollection
==============================================================
The SequencedCollection uses "void addFirst(e)", there's no indication 
if the collection is changed; this is the only place I'm aware of in the 
Collections universe where a functional add operation does not indicate 
changed when it may or may not change the collection (I didn't look, 
just guessing). List and Set both have "changed" consistent with their 
"equals()".

Consistent with List equals, the MRU addItem() implementation returns 
changed true if either sequence or content changed.

The MRU implementation as shown may have an inaccurate return from 
addAll(); it's possible that "changed == true && before.equals(after)". 
Collection specifies the addAll() return. It's expensive to produce an 
accurate result from addAll(). I suppose it could be argued that the 
definition of changed is ambiguous in this case. Document that AddAll() 
may return a false positive.

Thinking out loud... In SequencedSet a method getChanged() returning an 
EnumSet with SEQUENCE_CHANGED, CONTENTS_CHANGED bits. Could 
automatically clear the bits on every operation or require an explicit 
call to clear the bits. This wouldn't help the addAll() problem, but 
supports a workaround for addFirst/addLast.

pluggable Eviction Policy
=========================
In my earlier message, when I shifted to refer to an eviction policy, I 
acknowledge that even a simple builtin policy was inappropriate. While 
this thread may not be the best place to discuss it, I'll make this 
comment here; can start another thread if there's reason.

It's tricky to extend a Collection and implement an Eviction Policy 
without support from the collection. For example, if there are several 
ways to add to the collection, then in general each must be intercepted 
and handled. It may be that simply overriding add(E) is sufficient, but 
the source for the implementation must be examined for each 
implementation and on every release.

This suggests an "interface EvictionPolicy" (or EvictionPolicySupport) 
which, at a minimum, is invoked whenever the collection's size 
increases; could be invoked whenever the collection is modified. 
Something like an optional setEvictionPolicy(ep), or solely by 
implementation.

MRU implementation example
==========================

/** MRU SequenceSet implementation. Newest/mru at start of list */
public abstract class AbstractMRU<E>
extends AbstractSequentialList<E> implements MRU<E> {
     protected final SequencedSet<E> seqset;
     protected int capacity = Integer.MAX_VALUE;

     public AbstractMRU(SequencedSet<E> c, int capacity) {
         this.seqset = c;
         setCapacity(capacity);
     }

     @Override public int size() { return seqset.size(); }
     @Override public Iterator() { return seqset.iterator(); }
     @Override public boolean addAll(Collection<? extends E> c) { 
addAllInFront(c); }
     @Override public void clear() { seqset.clear(); }
     @Override public boolean addItem(E key) {
         if(key == null)
             throw new IllegalArgExcept();
         if(Objects.equals(key, peekMRU())) return false; // no change
         seqset.addFirst(key);
         evict();
         return true; // must have changed since not early return
     }
     @Override public boolean removeItem(E key) { return 
seqset.remove(key); }

     @Override boolean setCapacity(int capacity) { // return changed
         if(capacity < 0)
             throw new IllegalArgumentException("Size must be >= 0");
         this.capacity = capacity;
         return evict();
     }
     @Override public int getCapacity() { return capacity; }

     protected E peekMRU() { return seqset.isEmpty ? null : 
seqset.getFirst(); }
     protected boolean evict() { // return true if size changed
         int extra = size() - capacity;
         if(extra <= 0)
             return false;
         while(--extra >= 0) seqset.removeLast();
         return true;
     }
     protectd abstract boolean addAllInFront(Collection<? extends E> c);
}

/**
  * Newest at the front of the list (opposite of LinkedHashset defaults).
  * This is not dependent on how SeqSet is implemented,
  * but just put it down here to get the mess out of the way.
  */
public LinkedHashSetMRU<E> extends AbstractMRU<E> {
     public LinkedHashSetMRU(int numElements) {
super(LinkedHashSet.newLinkedHashSet<E>(numElements), numElements);
     }

     protected boolean addAllInFront(Collection<? extends E> c) {
         // TODO: check null item
         boolean modified;
         if(c instanceof SequencedCollection sq) {
             for(e : sq.reversed())
                 modified |= seqset.addFirst(e);
         } else {
             // lazy
             for(e : new ArrayList(c).reversed())
                 modified |= seqset.addFirst(e);
         }
         return evict() || modified;
     }

     // TODO:    implement ListIterator's iterator(index),
     //          only forward iteration should be OK for equals.
     // ListIterator listIterator(int index)
     //     { return Collections.forwardListIterator(index, 
seqset.iterator()); }

     // Could handle equals for both List,Set.
     // Just List equals seems OK. Use MRU.containsAll directly for Set 
equals.
     @Override public boolean equals(Object o) {
         if (o == this)
             return true;
         if (!(o instanceof List))
             return false;
         Iterator<E> e1 = seqset.iterator();
         Iterator<?> e2 = ((List<?>) o).iterator();
         while (e1.hasNext() && e2.hasNext())
             if(!Objects.equals(e1.next(), e2.next())
                 return false;
         return !(e1.hasNext() || e2.hasNext());
     }
}


On 9/21/22 5:38 PM, Stuart Marks wrote:
> Hi, yes, this is the right place to discuss Sequenced Collections. I'm 
> glad you find it promising.
>
> Note that Sequenced Collections actually has very little 
> implementation in it, aside from various reversed views of things. The 
> actual data is still stored in existing concrete collections such as 
> ArrayList, ArrayDeque, LinkedHashMap, and TreeMap.
>
> I think Sequenced Collections has the right set of abstractions in it 
> as it stands, and I don't want to expand its scope by talking about 
> additional concepts like size limits or eviction policy.
>
> However, those things are quite reasonable to discuss independently of 
> the current Sequenced Collections proposal. Having a maximum size on a 
> collection seems independent of sequencing. An eviction policy *might* 
> be based on the sequence, but it might not; consider the various 
> eviction policies available for a cache library such as Caffeine [1].
>
> [1] https://github.com/ben-manes/caffeine/wiki
>
> However, I'm somewhat skeptical of trying to build things like 
> eviction policies directly into collections. It's tempting to add a 
> simple thing like a size and just throw away things in some 
> well-defined order whenever the size is exceeded. The problem is that 
> if this policy doesn't do *exactly* what you want to do, then you're 
> out of luck.
>
> The current (pre Sequenced Collections) LinkedHashMap is a good 
> example of this. It's suitable for a least-recently-inserted 
> expiration policy; there's a method removeEldestEntry() that programs 
> can use to implement a simple policy, size-based or otherwise. 
> (Unfortunately they have to subclass-and-override, but whatever.) The 
> problem is that it allows removal of only one element -- the eldest 
> (first) element.
>
> If you want to change the policy of insertion order of an LHM, you 
> have only one alternative: access order. Enabling this has some weird 
> side effects though. For example, get() now rearranges the order of 
> entries in the map, and is thus a structural modification -- which 
> means that it spoils any iterators over the map's views.
>
> These are both fairly common cases, which is probably why they were 
> added. But they're not very flexible, and if you want to do something 
> slightly different, you're on your own -- and it's pretty hard to 
> implement your own policy, because LHM lacks a bunch of essential 
> operations.
>
> Where the Sequenced Collections proposal helps is that instead of 
> adding more policies, it adds the missing primitive operations. You 
> can add/get/remove at either end, and you can reposition mappings to 
> either end. If you have some different recent-usage policy or some 
> unusual cache eviction policy that I've never heard of, you can use 
> the primitives to implement it yourself. That's much better than 
> trying to bake a few more specific cases into LinkedHashMap or other 
> collections.
>
>> Is there a discussion about making the SynchronizedCollection family 
>> of classes public? 
>
> No. Synchronizing on every collection operation is the wrong level of 
> abstraction. Typical collection usage involves too much external 
> iteration and too much check-then-act logic. Callers would have to 
> wrap those in synchronized blocks, and in general they don't know when 
> that's necessary. Certain transaction-style operations (like 
> Map::computeIfAbsent) can be made to work, but those are all pretty 
> low level.
>
> s'marks
>
>
>
> On 9/21/22 9:32 AM, Ernie Rael wrote:
>>  > I don't see why you think a general collection...
>>
>> I thought the Subject would be sufficient to indicate that I was not 
>> talking about collections in general. I should have been more precise 
>> with my words; guess I was just excited by a bi-directional ordered set.
>>
>> The MRU _example_ is useful; the current collections handle it poorly 
>> and Sequenced Collections is ideal. Caches with an eviction policy 
>> are common; I suspect caches will be a common use for SequencedSet 
>> family. Note there are fixed sized Collections and 
>> SequencedCollection borrows heavily from that family. Perhaps this 
>> issue should be considered in the context of adding an **Eviction 
>> Policy** to appropriate collections.
>>
>> MRU is a Collection; for example, I pass an MRU to a persistence 
>> mechanism that takes a collection. Saying "all methods offered by 
>> `Collection` should [not] even be part of an `MRU` interface" is 
>> innacurate, especially when considered in the context of a 
>> SequencedCollection.
>>
>> -ernie
>>
>> PS - Loosely related is extending a Collection and providing a 
>> synchronized version. Is there a discussion about making the 
>> SynchronizedCollection family of classes public?
>>
>>
>> On 9/21/22 4:22 AM, John Hendrikx wrote:
>>> I don't see why you think a general collection, that is in 99.9% of 
>>> the cases not used to implement an MRU, should burden every call to 
>>> #add with a check to see if it isn't exceeding its maximum size or 
>>> to see if a maximum size has been set.
>>>
>>> This is much better done by composition, as I don't think all 
>>> methods offered by `Collection` should even be part of an `MRU` 
>>> interface.
>>>
>>> --John
>>>
>>> On 20/09/2022 21:08, Ernie Rael wrote:
>>>> (There may be a better place to send this, let me know where)
>>>>
>>>> Suggesting an option to limit the size of the collection, e.g 
>>>> "setMaxSize(int)", default of zero means no limit.
>>>>
>>>> I put together "interface MRU<E> extends Collection<E>" some months 
>>>> ago, it has two implementations based on LinkedList and 
>>>> LinkedHashSet. The code can be seen at 
>>>> https://sourceforge.net/p/jvi/raelity-lib/ci/default/tree/lib/src/main/java/com/raelity/lib/ 
>>>>
>>>>
>>>> A SequencedCollection, as outlined in the JEP draft 2020/09/01, 
>>>> would be almost perfect to implement MRU; I've run into most of the 
>>>> problems/issues discussed in the JEP draft.
>>>>
>>>> The MRU is a cache, as I use it; it typically has a max size for 
>>>> the collection. Handling this natively in the collection would be 
>>>> ideal; if an add operation would overflow, remove the item at the 
>>>> other end. Note that addAll() is used when initializing from 
>>>> backing store.
>>>>
>>>> FYI, I use a "Supplier<Integer>" to the constructor to provide 
>>>> maxSize, but a property makes much more sense. I'll make that 
>>>> change in MRU for sanity, and get rid of the trim() method. 
>>>> setMaxSize can do the trim.
>>>>
>>>> -ernie
>>>>
>>>
>>
>



More information about the core-libs-dev mailing list