[External] : Re: Provide thread-safe and concurrent sequenced collections with insertion order?
Sebastian Fischer
mail at sebfisch.de
Wed Nov 15 09:16:49 UTC 2023
Thank you for those references! The docs by Ben Manes look like a good
starting point to learn more about implementing a concurrent cache.
To elaborate on my motivation: I work as an educator explaining concepts
behind Java extensions in a corporate context. LRU caches seemed like an
interesting choice as an example demonstrating the new interfaces for
sequenced collections together with virtual threads. But my main interest
(that's why I asked here) is in understanding design considerations for the
Java core libraries so I can explain them to others.
When preparing the example, I stumbled upon what I perceived as gaps in the
core libraries:
- Collections.synchronizedSequencedSet and similar wrappers are not
provided although Collections.synchronizedSortedSet (and others) are
- There is no sequenced collection with external ordering in
java.util.concurrent
Here is a summary of what I infer from our interaction:
- Synchronized wrappers are of limited use. The benefit of adding them for
sequenced collections is unclear, despite already providing them for other
interfaces extending Set and Map.
- The implementation of concurrent versions of sequenced collections with
external ordering is non-trivial and the benefit of adding them to the core
libraries is unclear. For specific use cases like concurrent caches there
are external libraries.
- Although the mentioned gaps are apparent, it is unlikely that they will
be filled before clarifying how that could be useful.
Assuming that's a fair summary, I think I understand better now why the
mentioned gaps are there. So, thanks again for your input!
On Tue, Nov 14, 2023 at 8:43 PM Stuart Marks <stuart.marks at oracle.com>
wrote:
> Ah, so you want a concurrent cache! This is why it's important to talk
> about use cases. Turns out there are a lot of subtle issues here, and there
> has been a lot of work in this area outside the JDK. For example, a bit of
> searching turned up this Stack Overflow question [1]. The answers aren't
> particularly valuable, but there is this comment [2] from Ben Manes, who
> knows something about this topic:
>
> ConcurrentLinkedHashMap [3] beget Guava's Cache [4] which beget Caffeine.
> [5] [6]
>
> [1]:
> https://stackoverflow.com/questions/40239485/concurrent-lru-cache-implementation
>
> [2]:
> https://stackoverflow.com/questions/40239485/concurrent-lru-cache-implementation#comment67809840_40239485
>
> [3]: https://github.com/ben-manes/concurrentlinkedhashmap/wiki/Design
>
> [4]:
> https://github.com/ben-manes/concurrentlinkedhashmap/blob/wiki/ConcurrentCachingAtGoogle.pdf
>
> [5]:
> http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
>
> [6]: https://github.com/ben-manes/caffeine
>
> It looks like "ConcurrentLinkedHashMap" is a concurrent hash map that
> maintains ordering but that doesn't actually maintain a linked list of
> entries in the same way as LinkedHashMap. I haven't looked closely at the
> implementation, but I'd guess that it maintains ordering using a weakly
> consistent side data structure.
>
> Weak consistency seems to be the flavor of most concurrent data
> structures; you give up non-essential consistency in order to gain
> concurrency. For example, ConcurrentHashMap's views' iterators are "weakly
> consistent" which roughly means that an element reported by the iterator
> was a member at one time, but elements might not be reported by the
> iterator if they were added or removed sometime during the iteration. For
> an LRU cache, weaker semantics might be that you don't care about the exact
> ordering of recency of usage. (Indeed, ordering of events is somewhat
> ill-defined in a concurrent system.) Instead, you probably want things that
> were used fairly recently to stay in the cache longer, while things that
> haven't been used recently should tend to expire earlier.
>
> I'd suggest looking at some of these caching libraries. Or if you're keen
> to roll your own, read some of the design documents. Which might convince
> you that you don't want to roll your own. :-)
>
> s'marks
>
> On 11/10/23 3:26 AM, Sebastian Fischer wrote:
>
> Hi Stuart.
>
> Thanks, I understand now how the limited thread safety of "synchronized"
> wrappers has led to not adding new ones.
>
> My use case is an LRU-cache that is accessed by a large number of
> (virtual) threads.
>
> In my initial implementation I constructed a LinkedHashMap passing true
> for the accessOrder parameter, implemented removeEldestEntry appropriately
> and then wrapped it using Collections.sequencedMap. Thanks to this
> interface, I did not need a specialized wrapper supporting putLast or
> pollFirstEntry to manage the capacity myself. So, wrapping a LinkedHashMap
> seemed like a viable approach at first.
>
> But, the map is being used by many threads, and a synchronized wrapper is
> governed by a single exclusion lock. I improved throughput using a
> concurrent map supporting concurrent reads and (some) writes. Here is a
> simplified example using a sequenced collection instead of a map.
>
> public record RecentlyAccessed<T>(int capacity, SequencedCollection<T>
> elements) {
> public RecentlyAccessed(int capacity) {
> this(capacity, new ConcurrentLinkedDeque<>());
> }
>
> public void add(T elem) {
> elements.addLast(elem);
> while (capacity < elements.size()) {
> elements.removeFirst();
> }
> }
> }
>
> As the add method is not synchronized, the real capacity might deviate
> from the intended capacity. Additionally, ConcurrentLinkedDeque can contain
> duplicate elements. I could implement repositioning by removing all
> occurrences of elem before calling addLast but that would introduce an
> intermediate state where elem is not present (even if it was present
> before) due to the lack of synchronization.
>
> In my current implementation of the cache I use a ConcurrentHashMap with
> an additional ConcurrentLinkedDeque holding keys in access order telling me
> which entries to remove from the map. But this approach is not really
> thread-safe due to missing synchronization and incorrect due to duplicate
> keys in the deque.
>
> I was hoping to use a ConcurrentLinkedHashSet (or really
> ConcurrentLinkedHashMap, but both seem useful), where repositioning would
> presumably be implemented in a thread-safe way. Ideally, the option to
> override removeEldestEntry would allow it to also maintain the intended
> capacity in a thread-safe way.
>
> I'm not familiar enough with the implementation of concurrent collections
> to judge whether my hope is justified. But essentially, my use case would
> be an LRU-cache that does not use a single exclusion lock to improve
> concurrent access by a large number of (virtual) threads.
>
> Sebastian
>
>
>
> On Fri, Nov 10, 2023 at 1:03 AM Stuart Marks <stuart.marks at oracle.com>
> wrote:
>
>> Hi Sebastian,
>>
>> Regarding the lack of "synchronized" wrappers for Sequenced Collections:
>> the main
>> issue is that they provide only a limited sense of "thread safety" and as
>> such don't
>> add much actual value. Specifically, the synchronized wrappers hold a
>> lock only
>> around invocations of individual methods. This omits a large class of
>> common,
>> compound operations on collections, such as check-then-act operations and
>> bulk
>> operations that involve iteration or streaming. These operations aren't
>> thread-safe
>> with synchronized wrappers. To perform these operations safely from
>> multiple
>> threads, the client needs to do its own locking around the compound
>> operation. This
>> mostly defeats the purpose of the synchronized wrappers.
>>
>> If you need to use something like LinkedHashMap/LinkedHashSet from
>> multiple threads,
>> my recommendation is to write your own class that exposes exactly the set
>> of
>> operations you need and that does proper locking for those operations. It
>> can then
>> delegate the appropriate operations to an internal collection.
>>
>> Regarding concurrent collections: these are mainly intended for
>> operations to
>> proceed concurrently on different parts of the collection, such as
>> updating
>> different mappings in a ConcurrentHashMap simultaneously, or having
>> producers/consumers operating on different ends of a Queue. I note that
>> historically, the concurrent collections haven't included concurrent
>> versions of
>> LinkedHashSet/Map or general-purpose List implementations. (There is
>> CopyOnWriteArrayList/Set, but they're highly specialized and can be
>> prohibitively
>> expensive for use cases that involve a lot of updates.)
>>
>> While it looks like things are missing from the concurrent collections,
>> I'd have to
>> ask what use cases would be satisfied by adding something like a
>> "concurrent
>> sequenced set" (or map) that wouldn't be fulfilled with a customized
>> wrapper around
>> LinkedHashSet/Map.
>>
>> s'marks
>>
>>
>> On 11/2/23 4:56 AM, Sebastian Fischer wrote:
>> > Hello.
>> >
>> > I am new to this list so might have missed previous discussion about
>> this but
>> > could not find what I was looking for in the archives.
>> >
>> > The Sequenced collections JEP adds the following static methods to
>> > java.util.Collections.
>> >
>> > - unmodifiableSequencedCollection
>> > - unmodifiableSequencedSet
>> > - unmodifiableSequencedMap
>> >
>> > However, the set of static methods returning thread-safe versions of
>> their
>> > arguments (like synchronizedCollection, synchronizedSet, and
>> synchronizedMap) has
>> > not been extended.
>> >
>> > As a consequence, I don't see a straightforward way to create
>> thread-safe
>> > sequenced sets or maps where encounter-order corresponds to insertion
>> order (like
>> > LinkedHashSet or LinkedHashMap) and access them as SequencedSet or
>> SequencedMap.
>> >
>> > When using the available methods mentioned above, the result is not a
>> sequenced type.
>> >
>> > For predefined sets and maps where encounter order corresponds to
>> natural order,
>> > there are the following methods in java.util.Collections.
>> >
>> > - synchronizedNavigableSet
>> > - synchronizedNavigableMap
>> > - synchronizedSortedSet
>> > - synchronizedSortedMap
>> >
>> > However, these methods cannot be used with LinkedHashSet or
>> LinkedHashMap.
>> >
>> > The following methods would be useful in order to create thread-safe
>> sequenced
>> > collections where encounter order is insertion order, but they are
>> currently not
>> > provided.
>> >
>> > - synchronizedSequencedCollection
>> > - synchronizedSequencedSet
>> > - synchronizedSequencedMap
>> >
>> > What are the reasons for or against adding these methods?
>> >
>> > There are also no implementations of concurrent sequenced sets and maps
>> where
>> > encounter order is insertion order in java.util.concurrent. All of the
>> provided
>> > concurrent sequenced sets and maps are based on natural order, and
>> consequently do
>> > not support addFirst, and so on.
>> >
>> > Are there plans to add concurrent sequenced collections that support
>> the whole
>> > interface of sequenced collections?
>> >
>> > Kind regards,
>> > Sebastian
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20231115/7893c6c2/attachment-0001.htm>
More information about the core-libs-dev
mailing list