Sequenced Collections
Stuart Marks
stuart.marks at oracle.com
Thu Sep 22 00:38:42 UTC 2022
Hi, yes, this is the right place to discuss Sequenced Collections. I'm glad you find
it promising.
Note that Sequenced Collections actually has very little implementation in it, aside
from various reversed views of things. The actual data is still stored in existing
concrete collections such as ArrayList, ArrayDeque, LinkedHashMap, and TreeMap.
I think Sequenced Collections has the right set of abstractions in it as it stands,
and I don't want to expand its scope by talking about additional concepts like size
limits or eviction policy.
However, those things are quite reasonable to discuss independently of the current
Sequenced Collections proposal. Having a maximum size on a collection seems
independent of sequencing. An eviction policy *might* be based on the sequence, but
it might not; consider the various eviction policies available for a cache library
such as Caffeine [1].
[1] https://github.com/ben-manes/caffeine/wiki
However, I'm somewhat skeptical of trying to build things like eviction policies
directly into collections. It's tempting to add a simple thing like a size and just
throw away things in some well-defined order whenever the size is exceeded. The
problem is that if this policy doesn't do *exactly* what you want to do, then you're
out of luck.
The current (pre Sequenced Collections) LinkedHashMap is a good example of this.
It's suitable for a least-recently-inserted expiration policy; there's a method
removeEldestEntry() that programs can use to implement a simple policy, size-based
or otherwise. (Unfortunately they have to subclass-and-override, but whatever.) The
problem is that it allows removal of only one element -- the eldest (first) element.
If you want to change the policy of insertion order of an LHM, you have only one
alternative: access order. Enabling this has some weird side effects though. For
example, get() now rearranges the order of entries in the map, and is thus a
structural modification -- which means that it spoils any iterators over the map's
views.
These are both fairly common cases, which is probably why they were added. But
they're not very flexible, and if you want to do something slightly different,
you're on your own -- and it's pretty hard to implement your own policy, because LHM
lacks a bunch of essential operations.
Where the Sequenced Collections proposal helps is that instead of adding more
policies, it adds the missing primitive operations. You can add/get/remove at either
end, and you can reposition mappings to either end. If you have some different
recent-usage policy or some unusual cache eviction policy that I've never heard of,
you can use the primitives to implement it yourself. That's much better than trying
to bake a few more specific cases into LinkedHashMap or other collections.
> Is there a discussion about making the SynchronizedCollection family of classes public?
No. Synchronizing on every collection operation is the wrong level of abstraction.
Typical collection usage involves too much external iteration and too much
check-then-act logic. Callers would have to wrap those in synchronized blocks, and
in general they don't know when that's necessary. Certain transaction-style
operations (like Map::computeIfAbsent) can be made to work, but those are all pretty
low level.
s'marks
On 9/21/22 9:32 AM, Ernie Rael wrote:
> > I don't see why you think a general collection...
>
> I thought the Subject would be sufficient to indicate that I was not talking about
> collections in general. I should have been more precise with my words; guess I was
> just excited by a bi-directional ordered set.
>
> The MRU _example_ is useful; the current collections handle it poorly and Sequenced
> Collections is ideal. Caches with an eviction policy are common; I suspect caches
> will be a common use for SequencedSet family. Note there are fixed sized Collections
> and SequencedCollection borrows heavily from that family. Perhaps this issue should
> be considered in the context of adding an **Eviction Policy** to appropriate
> collections.
>
> MRU is a Collection; for example, I pass an MRU to a persistence mechanism that
> takes a collection. Saying "all methods offered by `Collection` should [not] even be
> part of an `MRU` interface" is innacurate, especially when considered in the context
> of a SequencedCollection.
>
> -ernie
>
> PS - Loosely related is extending a Collection and providing a synchronized version.
> Is there a discussion about making the SynchronizedCollection family of classes public?
>
>
> On 9/21/22 4:22 AM, John Hendrikx wrote:
>> I don't see why you think a general collection, that is in 99.9% of the cases not
>> used to implement an MRU, should burden every call to #add with a check to see if
>> it isn't exceeding its maximum size or to see if a maximum size has been set.
>>
>> This is much better done by composition, as I don't think all methods offered by
>> `Collection` should even be part of an `MRU` interface.
>>
>> --John
>>
>> On 20/09/2022 21:08, Ernie Rael wrote:
>>> (There may be a better place to send this, let me know where)
>>>
>>> Suggesting an option to limit the size of the collection, e.g "setMaxSize(int)",
>>> default of zero means no limit.
>>>
>>> I put together "interface MRU<E> extends Collection<E>" some months ago, it has
>>> two implementations based on LinkedList and LinkedHashSet. The code can be seen
>>> at
>>> https://sourceforge.net/p/jvi/raelity-lib/ci/default/tree/lib/src/main/java/com/raelity/lib/
>>>
>>>
>>> A SequencedCollection, as outlined in the JEP draft 2020/09/01, would be almost
>>> perfect to implement MRU; I've run into most of the problems/issues discussed in
>>> the JEP draft.
>>>
>>> The MRU is a cache, as I use it; it typically has a max size for the collection.
>>> Handling this natively in the collection would be ideal; if an add operation
>>> would overflow, remove the item at the other end. Note that addAll() is used when
>>> initializing from backing store.
>>>
>>> FYI, I use a "Supplier<Integer>" to the constructor to provide maxSize, but a
>>> property makes much more sense. I'll make that change in MRU for sanity, and get
>>> rid of the trim() method. setMaxSize can do the trim.
>>>
>>> -ernie
>>>
>>
>
More information about the core-libs-dev
mailing list