Sequenced Collections

Ernie Rael errael at raelity.com
Wed Oct 5 04:38:58 UTC 2022


Summary of key points (maybe the mail was TL;DR)

  * SequencedCollection methods addFirst,addLast are the only methods in
    Collection hierarchy (AFAIK) that might modify the collection and do
    not return/signal if the collection was modified. Seems like
    offerFirst,offerLast are more consistent with Collections and still
    avoid method proliferation.
  * With LinkedHashSet, seems like listIterator is missing. Rather than
    making that part of SequencedCollection, maybe an "interface
    ListIterable { ListIterator listIterator(int index); }". In addition
    to modification, bi-directional might also be optional, to support
    it's use with list equals.

-ernie

On 9/26/22 11:31 AM, Ernie Rael wrote:
> A SequencedCollection listIterator() is missed; it's useful for 
> List.equals, or to implement a List in general. LinkedHashSet seems 
> incomplete without it. Even something like "ListIterator 
> Collections.forwardListIterator(int index, iterator)", if compatible 
> with equals() of the various list types, would be handy. Doesn't 
> forward only work for sublist?. Only traversal, optional modification. 
> Modification is tricky; an add might have an implicit remove for a 
> Set. If not part of SequencedCollection, maybe "interface 
> ListIteratorProvider { ListIterator listIterator(int index); }" to tag 
> as needed.
>
> This message is primarily about "definition of change and equals" and 
> "pluggable eviction policy"; they're discussed through a simple MRU 
> example implemented with a SequencedSet and a capacity limit. The 
> 'changed' concept comes from Collection<>.
>
> // MRU implementation recommendations:
> //      - implement List<> for equals
> //      - modifications only permitted through direct MRU methods
> public interface MRU<E> extends Collection<E> {
>     boolean addItem(E key);        // return changed
>     boolean removeItem(E key);     // remove(E) contract
>     boolean setCapacity(int size); // return changed
>     int getCapacity();
>     void clear();
>     boolean addAll(Collection< ? extends E> c);
>     public static <E> MRU<E> synchronizedMRU(MRU<E> mru) {...}
> }
>
> To check out SequencedCollection I sketched out an implementation of 
> MRU, see below. Currently I have two implementations based on 
> LinkedList and LinkedHashSet, each has some ugly stuff. The 
> implementation based on SequencedSet below is clean. All the MRU 
> methods are easily implemented without consideration of the actual 
> implementation, but AddAll() is problematic. removeItem() is included 
> to allow external EvictionPolicy.
>
> Is there a preferred direction for performance? I'm guessing the only 
> difference might be going through a "reversed()" view.
>
> Definition of "change" and "equals()" in a SequencedCollection
> ==============================================================
> The SequencedCollection uses "void addFirst(e)", there's no indication 
> if the collection is changed; this is the only place I'm aware of in 
> the Collections universe where a functional add operation does not 
> indicate changed when it may or may not change the collection (I 
> didn't look, just guessing). List and Set both have "changed" 
> consistent with their "equals()".
>
> Consistent with List equals, the MRU addItem() implementation returns 
> changed true if either sequence or content changed.
>
> The MRU implementation as shown may have an inaccurate return from 
> addAll(); it's possible that "changed == true && 
> before.equals(after)". Collection specifies the addAll() return. It's 
> expensive to produce an accurate result from addAll(). I suppose it 
> could be argued that the definition of changed is ambiguous in this 
> case. Document that AddAll() may return a false positive.
>
> Thinking out loud... In SequencedSet a method getChanged() returning 
> an EnumSet with SEQUENCE_CHANGED, CONTENTS_CHANGED bits. Could 
> automatically clear the bits on every operation or require an explicit 
> call to clear the bits. This wouldn't help the addAll() problem, but 
> supports a workaround for addFirst/addLast.
>
> pluggable Eviction Policy
> =========================
> In my earlier message, when I shifted to refer to an eviction policy, 
> I acknowledge that even a simple builtin policy was inappropriate. 
> While this thread may not be the best place to discuss it, I'll make 
> this comment here; can start another thread if there's reason.
>
> It's tricky to extend a Collection and implement an Eviction Policy 
> without support from the collection. For example, if there are several 
> ways to add to the collection, then in general each must be 
> intercepted and handled. It may be that simply overriding add(E) is 
> sufficient, but the source for the implementation must be examined for 
> each implementation and on every release.
>
> This suggests an "interface EvictionPolicy" (or EvictionPolicySupport) 
> which, at a minimum, is invoked whenever the collection's size 
> increases; could be invoked whenever the collection is modified. 
> Something like an optional setEvictionPolicy(ep), or solely by 
> implementation.
>
> MRU implementation example
> ==========================
>
> /** MRU SequenceSet implementation. Newest/mru at start of list */
> public abstract class AbstractMRU<E>
> extends AbstractSequentialList<E> implements MRU<E> {
>     protected final SequencedSet<E> seqset;
>     protected int capacity = Integer.MAX_VALUE;
>
>     public AbstractMRU(SequencedSet<E> c, int capacity) {
>         this.seqset = c;
>         setCapacity(capacity);
>     }
>
>     @Override public int size() { return seqset.size(); }
>     @Override public Iterator() { return seqset.iterator(); }
>     @Override public boolean addAll(Collection<? extends E> c) { 
> addAllInFront(c); }
>     @Override public void clear() { seqset.clear(); }
>     @Override public boolean addItem(E key) {
>         if(key == null)
>             throw new IllegalArgExcept();
>         if(Objects.equals(key, peekMRU())) return false; // no change
>         seqset.addFirst(key);
>         evict();
>         return true; // must have changed since not early return
>     }
>     @Override public boolean removeItem(E key) { return 
> seqset.remove(key); }
>
>     @Override boolean setCapacity(int capacity) { // return changed
>         if(capacity < 0)
>             throw new IllegalArgumentException("Size must be >= 0");
>         this.capacity = capacity;
>         return evict();
>     }
>     @Override public int getCapacity() { return capacity; }
>
>     protected E peekMRU() { return seqset.isEmpty ? null : 
> seqset.getFirst(); }
>     protected boolean evict() { // return true if size changed
>         int extra = size() - capacity;
>         if(extra <= 0)
>             return false;
>         while(--extra >= 0) seqset.removeLast();
>         return true;
>     }
>     protectd abstract boolean addAllInFront(Collection<? extends E> c);
> }
>
> /**
>  * Newest at the front of the list (opposite of LinkedHashset defaults).
>  * This is not dependent on how SeqSet is implemented,
>  * but just put it down here to get the mess out of the way.
>  */
> public LinkedHashSetMRU<E> extends AbstractMRU<E> {
>     public LinkedHashSetMRU(int numElements) {
> super(LinkedHashSet.newLinkedHashSet<E>(numElements), numElements);
>     }
>
>     protected boolean addAllInFront(Collection<? extends E> c) {
>         // TODO: check null item
>         boolean modified;
>         if(c instanceof SequencedCollection sq) {
>             for(e : sq.reversed())
>                 modified |= seqset.addFirst(e);
>         } else {
>             // lazy
>             for(e : new ArrayList(c).reversed())
>                 modified |= seqset.addFirst(e);
>         }
>         return evict() || modified;
>     }
>
>     // TODO:    implement ListIterator's iterator(index),
>     //          only forward iteration should be OK for equals.
>     // ListIterator listIterator(int index)
>     //     { return Collections.forwardListIterator(index, 
> seqset.iterator()); }
>
>     // Could handle equals for both List,Set.
>     // Just List equals seems OK. Use MRU.containsAll directly for Set 
> equals.
>     @Override public boolean equals(Object o) {
>         if (o == this)
>             return true;
>         if (!(o instanceof List))
>             return false;
>         Iterator<E> e1 = seqset.iterator();
>         Iterator<?> e2 = ((List<?>) o).iterator();
>         while (e1.hasNext() && e2.hasNext())
>             if(!Objects.equals(e1.next(), e2.next())
>                 return false;
>         return !(e1.hasNext() || e2.hasNext());
>     }
> }
>
>
> On 9/21/22 5:38 PM, Stuart Marks wrote:
>> Hi, yes, this is the right place to discuss Sequenced Collections. 
>> I'm glad you find it promising.
>>
>> Note that Sequenced Collections actually has very little 
>> implementation in it, aside from various reversed views of things. 
>> The actual data is still stored in existing concrete collections such 
>> as ArrayList, ArrayDeque, LinkedHashMap, and TreeMap.
>>
>> I think Sequenced Collections has the right set of abstractions in it 
>> as it stands, and I don't want to expand its scope by talking about 
>> additional concepts like size limits or eviction policy.
>>
>> However, those things are quite reasonable to discuss independently 
>> of the current Sequenced Collections proposal. Having a maximum size 
>> on a collection seems independent of sequencing. An eviction policy 
>> *might* be based on the sequence, but it might not; consider the 
>> various eviction policies available for a cache library such as 
>> Caffeine [1].
>>
>> [1] https://github.com/ben-manes/caffeine/wiki
>>
>> However, I'm somewhat skeptical of trying to build things like 
>> eviction policies directly into collections. It's tempting to add a 
>> simple thing like a size and just throw away things in some 
>> well-defined order whenever the size is exceeded. The problem is that 
>> if this policy doesn't do *exactly* what you want to do, then you're 
>> out of luck.
>>
>> The current (pre Sequenced Collections) LinkedHashMap is a good 
>> example of this. It's suitable for a least-recently-inserted 
>> expiration policy; there's a method removeEldestEntry() that programs 
>> can use to implement a simple policy, size-based or otherwise. 
>> (Unfortunately they have to subclass-and-override, but whatever.) The 
>> problem is that it allows removal of only one element -- the eldest 
>> (first) element.
>>
>> If you want to change the policy of insertion order of an LHM, you 
>> have only one alternative: access order. Enabling this has some weird 
>> side effects though. For example, get() now rearranges the order of 
>> entries in the map, and is thus a structural modification -- which 
>> means that it spoils any iterators over the map's views.
>>
>> These are both fairly common cases, which is probably why they were 
>> added. But they're not very flexible, and if you want to do something 
>> slightly different, you're on your own -- and it's pretty hard to 
>> implement your own policy, because LHM lacks a bunch of essential 
>> operations.
>>
>> Where the Sequenced Collections proposal helps is that instead of 
>> adding more policies, it adds the missing primitive operations. You 
>> can add/get/remove at either end, and you can reposition mappings to 
>> either end. If you have some different recent-usage policy or some 
>> unusual cache eviction policy that I've never heard of, you can use 
>> the primitives to implement it yourself. That's much better than 
>> trying to bake a few more specific cases into LinkedHashMap or other 
>> collections.
>>
>>> Is there a discussion about making the SynchronizedCollection family 
>>> of classes public? 
>>
>> No. Synchronizing on every collection operation is the wrong level of 
>> abstraction. Typical collection usage involves too much external 
>> iteration and too much check-then-act logic. Callers would have to 
>> wrap those in synchronized blocks, and in general they don't know 
>> when that's necessary. Certain transaction-style operations (like 
>> Map::computeIfAbsent) can be made to work, but those are all pretty 
>> low level.
>>
>> s'marks
>>
>>
>>
>> On 9/21/22 9:32 AM, Ernie Rael wrote:
>>>  > I don't see why you think a general collection...
>>>
>>> I thought the Subject would be sufficient to indicate that I was not 
>>> talking about collections in general. I should have been more 
>>> precise with my words; guess I was just excited by a bi-directional 
>>> ordered set.
>>>
>>> The MRU _example_ is useful; the current collections handle it 
>>> poorly and Sequenced Collections is ideal. Caches with an eviction 
>>> policy are common; I suspect caches will be a common use for 
>>> SequencedSet family. Note there are fixed sized Collections and 
>>> SequencedCollection borrows heavily from that family. Perhaps this 
>>> issue should be considered in the context of adding an **Eviction 
>>> Policy** to appropriate collections.
>>>
>>> MRU is a Collection; for example, I pass an MRU to a persistence 
>>> mechanism that takes a collection. Saying "all methods offered by 
>>> `Collection` should [not] even be part of an `MRU` interface" is 
>>> innacurate, especially when considered in the context of a 
>>> SequencedCollection.
>>>
>>> -ernie
>>>
>>> PS - Loosely related is extending a Collection and providing a 
>>> synchronized version. Is there a discussion about making the 
>>> SynchronizedCollection family of classes public?
>>>
>>>
>>> On 9/21/22 4:22 AM, John Hendrikx wrote:
>>>> I don't see why you think a general collection, that is in 99.9% of 
>>>> the cases not used to implement an MRU, should burden every call to 
>>>> #add with a check to see if it isn't exceeding its maximum size or 
>>>> to see if a maximum size has been set.
>>>>
>>>> This is much better done by composition, as I don't think all 
>>>> methods offered by `Collection` should even be part of an `MRU` 
>>>> interface.
>>>>
>>>> --John
>>>>
>>>> On 20/09/2022 21:08, Ernie Rael wrote:
>>>>> (There may be a better place to send this, let me know where)
>>>>>
>>>>> Suggesting an option to limit the size of the collection, e.g 
>>>>> "setMaxSize(int)", default of zero means no limit.
>>>>>
>>>>> I put together "interface MRU<E> extends Collection<E>" some 
>>>>> months ago, it has two implementations based on LinkedList and 
>>>>> LinkedHashSet. The code can be seen at 
>>>>> https://sourceforge.net/p/jvi/raelity-lib/ci/default/tree/lib/src/main/java/com/raelity/lib/ 
>>>>>
>>>>>
>>>>> A SequencedCollection, as outlined in the JEP draft 2020/09/01, 
>>>>> would be almost perfect to implement MRU; I've run into most of 
>>>>> the problems/issues discussed in the JEP draft.
>>>>>
>>>>> The MRU is a cache, as I use it; it typically has a max size for 
>>>>> the collection. Handling this natively in the collection would be 
>>>>> ideal; if an add operation would overflow, remove the item at the 
>>>>> other end. Note that addAll() is used when initializing from 
>>>>> backing store.
>>>>>
>>>>> FYI, I use a "Supplier<Integer>" to the constructor to provide 
>>>>> maxSize, but a property makes much more sense. I'll make that 
>>>>> change in MRU for sanity, and get rid of the trim() method. 
>>>>> setMaxSize can do the trim.
>>>>>
>>>>> -ernie
>>>>>
>>>>
>>>
>>
>
>



More information about the core-libs-dev mailing list