Sequenced Collections
Ernie Rael
errael at raelity.com
Wed Oct 5 04:38:58 UTC 2022
Summary of key points (maybe the mail was TL;DR)
* SequencedCollection methods addFirst,addLast are the only methods in
Collection hierarchy (AFAIK) that might modify the collection and do
not return/signal if the collection was modified. Seems like
offerFirst,offerLast are more consistent with Collections and still
avoid method proliferation.
* With LinkedHashSet, seems like listIterator is missing. Rather than
making that part of SequencedCollection, maybe an "interface
ListIterable { ListIterator listIterator(int index); }". In addition
to modification, bi-directional might also be optional, to support
it's use with list equals.
-ernie
On 9/26/22 11:31 AM, Ernie Rael wrote:
> A SequencedCollection listIterator() is missed; it's useful for
> List.equals, or to implement a List in general. LinkedHashSet seems
> incomplete without it. Even something like "ListIterator
> Collections.forwardListIterator(int index, iterator)", if compatible
> with equals() of the various list types, would be handy. Doesn't
> forward only work for sublist?. Only traversal, optional modification.
> Modification is tricky; an add might have an implicit remove for a
> Set. If not part of SequencedCollection, maybe "interface
> ListIteratorProvider { ListIterator listIterator(int index); }" to tag
> as needed.
>
> This message is primarily about "definition of change and equals" and
> "pluggable eviction policy"; they're discussed through a simple MRU
> example implemented with a SequencedSet and a capacity limit. The
> 'changed' concept comes from Collection<>.
>
> // MRU implementation recommendations:
> // - implement List<> for equals
> // - modifications only permitted through direct MRU methods
> public interface MRU<E> extends Collection<E> {
> boolean addItem(E key); // return changed
> boolean removeItem(E key); // remove(E) contract
> boolean setCapacity(int size); // return changed
> int getCapacity();
> void clear();
> boolean addAll(Collection< ? extends E> c);
> public static <E> MRU<E> synchronizedMRU(MRU<E> mru) {...}
> }
>
> To check out SequencedCollection I sketched out an implementation of
> MRU, see below. Currently I have two implementations based on
> LinkedList and LinkedHashSet, each has some ugly stuff. The
> implementation based on SequencedSet below is clean. All the MRU
> methods are easily implemented without consideration of the actual
> implementation, but AddAll() is problematic. removeItem() is included
> to allow external EvictionPolicy.
>
> Is there a preferred direction for performance? I'm guessing the only
> difference might be going through a "reversed()" view.
>
> Definition of "change" and "equals()" in a SequencedCollection
> ==============================================================
> The SequencedCollection uses "void addFirst(e)", there's no indication
> if the collection is changed; this is the only place I'm aware of in
> the Collections universe where a functional add operation does not
> indicate changed when it may or may not change the collection (I
> didn't look, just guessing). List and Set both have "changed"
> consistent with their "equals()".
>
> Consistent with List equals, the MRU addItem() implementation returns
> changed true if either sequence or content changed.
>
> The MRU implementation as shown may have an inaccurate return from
> addAll(); it's possible that "changed == true &&
> before.equals(after)". Collection specifies the addAll() return. It's
> expensive to produce an accurate result from addAll(). I suppose it
> could be argued that the definition of changed is ambiguous in this
> case. Document that AddAll() may return a false positive.
>
> Thinking out loud... In SequencedSet a method getChanged() returning
> an EnumSet with SEQUENCE_CHANGED, CONTENTS_CHANGED bits. Could
> automatically clear the bits on every operation or require an explicit
> call to clear the bits. This wouldn't help the addAll() problem, but
> supports a workaround for addFirst/addLast.
>
> pluggable Eviction Policy
> =========================
> In my earlier message, when I shifted to refer to an eviction policy,
> I acknowledge that even a simple builtin policy was inappropriate.
> While this thread may not be the best place to discuss it, I'll make
> this comment here; can start another thread if there's reason.
>
> It's tricky to extend a Collection and implement an Eviction Policy
> without support from the collection. For example, if there are several
> ways to add to the collection, then in general each must be
> intercepted and handled. It may be that simply overriding add(E) is
> sufficient, but the source for the implementation must be examined for
> each implementation and on every release.
>
> This suggests an "interface EvictionPolicy" (or EvictionPolicySupport)
> which, at a minimum, is invoked whenever the collection's size
> increases; could be invoked whenever the collection is modified.
> Something like an optional setEvictionPolicy(ep), or solely by
> implementation.
>
> MRU implementation example
> ==========================
>
> /** MRU SequenceSet implementation. Newest/mru at start of list */
> public abstract class AbstractMRU<E>
> extends AbstractSequentialList<E> implements MRU<E> {
> protected final SequencedSet<E> seqset;
> protected int capacity = Integer.MAX_VALUE;
>
> public AbstractMRU(SequencedSet<E> c, int capacity) {
> this.seqset = c;
> setCapacity(capacity);
> }
>
> @Override public int size() { return seqset.size(); }
> @Override public Iterator() { return seqset.iterator(); }
> @Override public boolean addAll(Collection<? extends E> c) {
> addAllInFront(c); }
> @Override public void clear() { seqset.clear(); }
> @Override public boolean addItem(E key) {
> if(key == null)
> throw new IllegalArgExcept();
> if(Objects.equals(key, peekMRU())) return false; // no change
> seqset.addFirst(key);
> evict();
> return true; // must have changed since not early return
> }
> @Override public boolean removeItem(E key) { return
> seqset.remove(key); }
>
> @Override boolean setCapacity(int capacity) { // return changed
> if(capacity < 0)
> throw new IllegalArgumentException("Size must be >= 0");
> this.capacity = capacity;
> return evict();
> }
> @Override public int getCapacity() { return capacity; }
>
> protected E peekMRU() { return seqset.isEmpty ? null :
> seqset.getFirst(); }
> protected boolean evict() { // return true if size changed
> int extra = size() - capacity;
> if(extra <= 0)
> return false;
> while(--extra >= 0) seqset.removeLast();
> return true;
> }
> protectd abstract boolean addAllInFront(Collection<? extends E> c);
> }
>
> /**
> * Newest at the front of the list (opposite of LinkedHashset defaults).
> * This is not dependent on how SeqSet is implemented,
> * but just put it down here to get the mess out of the way.
> */
> public LinkedHashSetMRU<E> extends AbstractMRU<E> {
> public LinkedHashSetMRU(int numElements) {
> super(LinkedHashSet.newLinkedHashSet<E>(numElements), numElements);
> }
>
> protected boolean addAllInFront(Collection<? extends E> c) {
> // TODO: check null item
> boolean modified;
> if(c instanceof SequencedCollection sq) {
> for(e : sq.reversed())
> modified |= seqset.addFirst(e);
> } else {
> // lazy
> for(e : new ArrayList(c).reversed())
> modified |= seqset.addFirst(e);
> }
> return evict() || modified;
> }
>
> // TODO: implement ListIterator's iterator(index),
> // only forward iteration should be OK for equals.
> // ListIterator listIterator(int index)
> // { return Collections.forwardListIterator(index,
> seqset.iterator()); }
>
> // Could handle equals for both List,Set.
> // Just List equals seems OK. Use MRU.containsAll directly for Set
> equals.
> @Override public boolean equals(Object o) {
> if (o == this)
> return true;
> if (!(o instanceof List))
> return false;
> Iterator<E> e1 = seqset.iterator();
> Iterator<?> e2 = ((List<?>) o).iterator();
> while (e1.hasNext() && e2.hasNext())
> if(!Objects.equals(e1.next(), e2.next())
> return false;
> return !(e1.hasNext() || e2.hasNext());
> }
> }
>
>
> On 9/21/22 5:38 PM, Stuart Marks wrote:
>> Hi, yes, this is the right place to discuss Sequenced Collections.
>> I'm glad you find it promising.
>>
>> Note that Sequenced Collections actually has very little
>> implementation in it, aside from various reversed views of things.
>> The actual data is still stored in existing concrete collections such
>> as ArrayList, ArrayDeque, LinkedHashMap, and TreeMap.
>>
>> I think Sequenced Collections has the right set of abstractions in it
>> as it stands, and I don't want to expand its scope by talking about
>> additional concepts like size limits or eviction policy.
>>
>> However, those things are quite reasonable to discuss independently
>> of the current Sequenced Collections proposal. Having a maximum size
>> on a collection seems independent of sequencing. An eviction policy
>> *might* be based on the sequence, but it might not; consider the
>> various eviction policies available for a cache library such as
>> Caffeine [1].
>>
>> [1] https://github.com/ben-manes/caffeine/wiki
>>
>> However, I'm somewhat skeptical of trying to build things like
>> eviction policies directly into collections. It's tempting to add a
>> simple thing like a size and just throw away things in some
>> well-defined order whenever the size is exceeded. The problem is that
>> if this policy doesn't do *exactly* what you want to do, then you're
>> out of luck.
>>
>> The current (pre Sequenced Collections) LinkedHashMap is a good
>> example of this. It's suitable for a least-recently-inserted
>> expiration policy; there's a method removeEldestEntry() that programs
>> can use to implement a simple policy, size-based or otherwise.
>> (Unfortunately they have to subclass-and-override, but whatever.) The
>> problem is that it allows removal of only one element -- the eldest
>> (first) element.
>>
>> If you want to change the policy of insertion order of an LHM, you
>> have only one alternative: access order. Enabling this has some weird
>> side effects though. For example, get() now rearranges the order of
>> entries in the map, and is thus a structural modification -- which
>> means that it spoils any iterators over the map's views.
>>
>> These are both fairly common cases, which is probably why they were
>> added. But they're not very flexible, and if you want to do something
>> slightly different, you're on your own -- and it's pretty hard to
>> implement your own policy, because LHM lacks a bunch of essential
>> operations.
>>
>> Where the Sequenced Collections proposal helps is that instead of
>> adding more policies, it adds the missing primitive operations. You
>> can add/get/remove at either end, and you can reposition mappings to
>> either end. If you have some different recent-usage policy or some
>> unusual cache eviction policy that I've never heard of, you can use
>> the primitives to implement it yourself. That's much better than
>> trying to bake a few more specific cases into LinkedHashMap or other
>> collections.
>>
>>> Is there a discussion about making the SynchronizedCollection family
>>> of classes public?
>>
>> No. Synchronizing on every collection operation is the wrong level of
>> abstraction. Typical collection usage involves too much external
>> iteration and too much check-then-act logic. Callers would have to
>> wrap those in synchronized blocks, and in general they don't know
>> when that's necessary. Certain transaction-style operations (like
>> Map::computeIfAbsent) can be made to work, but those are all pretty
>> low level.
>>
>> s'marks
>>
>>
>>
>> On 9/21/22 9:32 AM, Ernie Rael wrote:
>>> > I don't see why you think a general collection...
>>>
>>> I thought the Subject would be sufficient to indicate that I was not
>>> talking about collections in general. I should have been more
>>> precise with my words; guess I was just excited by a bi-directional
>>> ordered set.
>>>
>>> The MRU _example_ is useful; the current collections handle it
>>> poorly and Sequenced Collections is ideal. Caches with an eviction
>>> policy are common; I suspect caches will be a common use for
>>> SequencedSet family. Note there are fixed sized Collections and
>>> SequencedCollection borrows heavily from that family. Perhaps this
>>> issue should be considered in the context of adding an **Eviction
>>> Policy** to appropriate collections.
>>>
>>> MRU is a Collection; for example, I pass an MRU to a persistence
>>> mechanism that takes a collection. Saying "all methods offered by
>>> `Collection` should [not] even be part of an `MRU` interface" is
>>> innacurate, especially when considered in the context of a
>>> SequencedCollection.
>>>
>>> -ernie
>>>
>>> PS - Loosely related is extending a Collection and providing a
>>> synchronized version. Is there a discussion about making the
>>> SynchronizedCollection family of classes public?
>>>
>>>
>>> On 9/21/22 4:22 AM, John Hendrikx wrote:
>>>> I don't see why you think a general collection, that is in 99.9% of
>>>> the cases not used to implement an MRU, should burden every call to
>>>> #add with a check to see if it isn't exceeding its maximum size or
>>>> to see if a maximum size has been set.
>>>>
>>>> This is much better done by composition, as I don't think all
>>>> methods offered by `Collection` should even be part of an `MRU`
>>>> interface.
>>>>
>>>> --John
>>>>
>>>> On 20/09/2022 21:08, Ernie Rael wrote:
>>>>> (There may be a better place to send this, let me know where)
>>>>>
>>>>> Suggesting an option to limit the size of the collection, e.g
>>>>> "setMaxSize(int)", default of zero means no limit.
>>>>>
>>>>> I put together "interface MRU<E> extends Collection<E>" some
>>>>> months ago, it has two implementations based on LinkedList and
>>>>> LinkedHashSet. The code can be seen at
>>>>> https://sourceforge.net/p/jvi/raelity-lib/ci/default/tree/lib/src/main/java/com/raelity/lib/
>>>>>
>>>>>
>>>>> A SequencedCollection, as outlined in the JEP draft 2020/09/01,
>>>>> would be almost perfect to implement MRU; I've run into most of
>>>>> the problems/issues discussed in the JEP draft.
>>>>>
>>>>> The MRU is a cache, as I use it; it typically has a max size for
>>>>> the collection. Handling this natively in the collection would be
>>>>> ideal; if an add operation would overflow, remove the item at the
>>>>> other end. Note that addAll() is used when initializing from
>>>>> backing store.
>>>>>
>>>>> FYI, I use a "Supplier<Integer>" to the constructor to provide
>>>>> maxSize, but a property makes much more sense. I'll make that
>>>>> change in MRU for sanity, and get rid of the trim() method.
>>>>> setMaxSize can do the trim.
>>>>>
>>>>> -ernie
>>>>>
>>>>
>>>
>>
>
>
More information about the core-libs-dev
mailing list