[External] : Re: Provide thread-safe and concurrent sequenced collections with insertion order?
Stuart Marks
stuart.marks at oracle.com
Tue Nov 14 19:43:30 UTC 2023
Ah, so you want a concurrent cache! This is why it's important to talk about use
cases. Turns out there are a lot of subtle issues here, and there has been a lot of
work in this area outside the JDK. For example, a bit of searching turned up this
Stack Overflow question [1]. The answers aren't particularly valuable, but there is
this comment [2] from Ben Manes, who knows something about this topic:
> ConcurrentLinkedHashMap [3] beget Guava's Cache [4] which beget Caffeine. [5] [6]
[1]: https://stackoverflow.com/questions/40239485/concurrent-lru-cache-implementation
[2]:
https://stackoverflow.com/questions/40239485/concurrent-lru-cache-implementation#comment67809840_40239485
[3]: https://github.com/ben-manes/concurrentlinkedhashmap/wiki/Design
[4]:
https://github.com/ben-manes/concurrentlinkedhashmap/blob/wiki/ConcurrentCachingAtGoogle.pdf
[5]: http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
[6]: https://github.com/ben-manes/caffeine
It looks like "ConcurrentLinkedHashMap" is a concurrent hash map that maintains
ordering but that doesn't actually maintain a linked list of entries in the same way
as LinkedHashMap. I haven't looked closely at the implementation, but I'd guess that
it maintains ordering using a weakly consistent side data structure.
Weak consistency seems to be the flavor of most concurrent data structures; you give
up non-essential consistency in order to gain concurrency. For example,
ConcurrentHashMap's views' iterators are "weakly consistent" which roughly means
that an element reported by the iterator was a member at one time, but elements
might not be reported by the iterator if they were added or removed sometime during
the iteration. For an LRU cache, weaker semantics might be that you don't care about
the exact ordering of recency of usage. (Indeed, ordering of events is somewhat
ill-defined in a concurrent system.) Instead, you probably want things that were
used fairly recently to stay in the cache longer, while things that haven't been
used recently should tend to expire earlier.
I'd suggest looking at some of these caching libraries. Or if you're keen to roll
your own, read some of the design documents. Which might convince you that you don't
want to roll your own. :-)
s'marks
On 11/10/23 3:26 AM, Sebastian Fischer wrote:
> Hi Stuart.
>
> Thanks, I understand now how the limited thread safety of "synchronized" wrappers
> has led to not adding new ones.
>
> My use case is an LRU-cache that is accessed by a large number of (virtual) threads.
>
> In my initial implementation I constructed a LinkedHashMap passing true for the
> accessOrder parameter, implemented removeEldestEntry appropriately and then
> wrapped it using Collections.sequencedMap. Thanks to this interface, I did not
> need a specialized wrapper supporting putLast or pollFirstEntry to manage the
> capacity myself. So, wrapping a LinkedHashMap seemed like a viable approach at first.
>
> But, the map is being used by many threads, and a synchronized wrapper is governed
> by a single exclusion lock. I improved throughput using a concurrent map
> supporting concurrent reads and (some) writes. Here is a simplified example using
> a sequenced collection instead of a map.
>
> public record RecentlyAccessed<T>(int capacity, SequencedCollection<T> elements) {
> public RecentlyAccessed(int capacity) {
> this(capacity, new ConcurrentLinkedDeque<>());
> }
>
> public void add(T elem) {
> elements.addLast(elem);
> while (capacity < elements.size()) {
> elements.removeFirst();
> }
> }
> }
>
> As the add method is not synchronized, the real capacity might deviate from the
> intended capacity. Additionally, ConcurrentLinkedDeque can contain duplicate
> elements. I could implement repositioning by removing all occurrences of elem
> before calling addLast but that would introduce an intermediate state where elem
> is not present (even if it was present before) due to the lack of synchronization.
>
> In my current implementation of the cache I use a ConcurrentHashMap with an
> additional ConcurrentLinkedDeque holding keys in access order telling me which
> entries to remove from the map. But this approach is not really thread-safe due to
> missing synchronization and incorrect due to duplicate keys in the deque.
>
> I was hoping to use a ConcurrentLinkedHashSet (or really ConcurrentLinkedHashMap,
> but both seem useful), where repositioning would presumably be implemented in a
> thread-safe way. Ideally, the option to override removeEldestEntry would allow it
> to also maintain the intended capacity in a thread-safe way.
>
> I'm not familiar enough with the implementation of concurrent collections to judge
> whether my hope is justified. But essentially, my use case would be an LRU-cache
> that does not use a single exclusion lock to improve concurrent access by a large
> number of (virtual) threads.
>
> Sebastian
>
>
>
> On Fri, Nov 10, 2023 at 1:03 AM Stuart Marks <stuart.marks at oracle.com> wrote:
>
> Hi Sebastian,
>
> Regarding the lack of "synchronized" wrappers for Sequenced Collections: the main
> issue is that they provide only a limited sense of "thread safety" and as such
> don't
> add much actual value. Specifically, the synchronized wrappers hold a lock only
> around invocations of individual methods. This omits a large class of common,
> compound operations on collections, such as check-then-act operations and bulk
> operations that involve iteration or streaming. These operations aren't
> thread-safe
> with synchronized wrappers. To perform these operations safely from multiple
> threads, the client needs to do its own locking around the compound operation.
> This
> mostly defeats the purpose of the synchronized wrappers.
>
> If you need to use something like LinkedHashMap/LinkedHashSet from multiple
> threads,
> my recommendation is to write your own class that exposes exactly the set of
> operations you need and that does proper locking for those operations. It can
> then
> delegate the appropriate operations to an internal collection.
>
> Regarding concurrent collections: these are mainly intended for operations to
> proceed concurrently on different parts of the collection, such as updating
> different mappings in a ConcurrentHashMap simultaneously, or having
> producers/consumers operating on different ends of a Queue. I note that
> historically, the concurrent collections haven't included concurrent versions of
> LinkedHashSet/Map or general-purpose List implementations. (There is
> CopyOnWriteArrayList/Set, but they're highly specialized and can be prohibitively
> expensive for use cases that involve a lot of updates.)
>
> While it looks like things are missing from the concurrent collections, I'd
> have to
> ask what use cases would be satisfied by adding something like a "concurrent
> sequenced set" (or map) that wouldn't be fulfilled with a customized wrapper
> around
> LinkedHashSet/Map.
>
> s'marks
>
>
> On 11/2/23 4:56 AM, Sebastian Fischer wrote:
> > Hello.
> >
> > I am new to this list so might have missed previous discussion about this but
> > could not find what I was looking for in the archives.
> >
> > The Sequenced collections JEP adds the following static methods to
> > java.util.Collections.
> >
> > - unmodifiableSequencedCollection
> > - unmodifiableSequencedSet
> > - unmodifiableSequencedMap
> >
> > However, the set of static methods returning thread-safe versions of their
> > arguments (like synchronizedCollection, synchronizedSet, and
> synchronizedMap) has
> > not been extended.
> >
> > As a consequence, I don't see a straightforward way to create thread-safe
> > sequenced sets or maps where encounter-order corresponds to insertion order
> (like
> > LinkedHashSet or LinkedHashMap) and access them as SequencedSet or SequencedMap.
> >
> > When using the available methods mentioned above, the result is not a
> sequenced type.
> >
> > For predefined sets and maps where encounter order corresponds to natural
> order,
> > there are the following methods in java.util.Collections.
> >
> > - synchronizedNavigableSet
> > - synchronizedNavigableMap
> > - synchronizedSortedSet
> > - synchronizedSortedMap
> >
> > However, these methods cannot be used with LinkedHashSet or LinkedHashMap.
> >
> > The following methods would be useful in order to create thread-safe sequenced
> > collections where encounter order is insertion order, but they are currently
> not
> > provided.
> >
> > - synchronizedSequencedCollection
> > - synchronizedSequencedSet
> > - synchronizedSequencedMap
> >
> > What are the reasons for or against adding these methods?
> >
> > There are also no implementations of concurrent sequenced sets and maps where
> > encounter order is insertion order in java.util.concurrent. All of the provided
> > concurrent sequenced sets and maps are based on natural order, and
> consequently do
> > not support addFirst, and so on.
> >
> > Are there plans to add concurrent sequenced collections that support the whole
> > interface of sequenced collections?
> >
> > Kind regards,
> > Sebastian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20231114/9ed65411/attachment-0001.htm>
More information about the core-libs-dev
mailing list