<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">Btw, these docs and code may be of interest since the problem domain is very similar: <a href="https://github.com/HdrHistogram/HdrHistogram#synchronization-and-concurrent-access">https://github.com/HdrHistogram/HdrHistogram#synchronization-and-concurrent-access</a></div><div dir="ltr"><br><blockquote type="cite">On Feb 21, 2023, at 11:06 AM, Ron Pressler <ron.pressler@oracle.com> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><span>Use whatever works best for you; my suggestion of ConcurrentLinkedQueue was merely based on its semantics, not any benchmarks. Ultimately we’d like to have a construct designed specifically for this, perhaps one that employs striping, similar to LongAdder.</span><br><span></span><br><span>— Ron </span><br><span></span><br><blockquote type="cite"><span>On 21 Feb 2023, at 05:33, Dr Heinz M. Kabutz <heinz@javaspecialists.eu> wrote:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Regarding using a ConcurrentLinkedQueue as a cache - someone asked me about this last week. I hacked together a quick demo and was surprised that the ArrayBlockingQueue seemed to work best under high contention. The demo is something I threw together in a few minutes, so don't judge me too harshly :-) And it's a silly demo, because hopefully we wouldn't contend so heavily on the cache.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>import java.util.*;</span><br></blockquote><blockquote type="cite"><span>import java.util.concurrent.*;</span><br></blockquote><blockquote type="cite"><span>import java.util.function.*;</span><br></blockquote><blockquote type="cite"><span>import java.util.stream.*;</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>public class PoolDemo {</span><br></blockquote><blockquote type="cite"><span> public static void main(String... args) {</span><br></blockquote><blockquote type="cite"><span> for (int i = 0; i < 10; i++) {</span><br></blockquote><blockquote type="cite"><span> test();</span><br></blockquote><blockquote type="cite"><span> }</span><br></blockquote><blockquote type="cite"><span> }</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span> private static void test() {</span><br></blockquote><blockquote type="cite"><span> test(new Stack<>(), Stack::push, Stack::pop);</span><br></blockquote><blockquote type="cite"><span> test(new ConcurrentLinkedDeque<>(), Deque::push, Deque::pop);</span><br></blockquote><blockquote type="cite"><span> test(new ConcurrentLinkedDeque<>(), Queue::add, Queue::remove);</span><br></blockquote><blockquote type="cite"><span> test(new ConcurrentLinkedQueue<>(), Queue::add, Queue::remove);</span><br></blockquote><blockquote type="cite"><span> test(new LinkedBlockingDeque<>(), Queue::add, Queue::remove);</span><br></blockquote><blockquote type="cite"><span> test(new LinkedBlockingQueue<>(), Queue::add, Queue::remove);</span><br></blockquote><blockquote type="cite"><span> test(new ArrayBlockingQueue<>(Runtime.getRuntime().availableProcessors() * 4), Queue::add, Queue::remove);</span><br></blockquote><blockquote type="cite"><span> test(new LinkedTransferQueue<>(), Queue::add, Queue::remove);</span><br></blockquote><blockquote type="cite"><span> System.out.println();</span><br></blockquote><blockquote type="cite"><span> }</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span> private static <T extends Collection<Integer>> void test(</span><br></blockquote><blockquote type="cite"><span> T collection, BiConsumer<T, Integer> push, ToIntFunction<T> pop) {</span><br></blockquote><blockquote type="cite"><span> System.out.print(collection.getClass().getSimpleName());</span><br></blockquote><blockquote type="cite"><span> List<Integer> expectedFinalValues = new ArrayList<>();</span><br></blockquote><blockquote type="cite"><span> for (int i = 0; i < Runtime.getRuntime().availableProcessors() * 2; i++) {</span><br></blockquote><blockquote type="cite"><span> push.accept(collection, i);</span><br></blockquote><blockquote type="cite"><span> expectedFinalValues.add(i);</span><br></blockquote><blockquote type="cite"><span> }</span><br></blockquote><blockquote type="cite"><span> long time = System.nanoTime();</span><br></blockquote><blockquote type="cite"><span> try {</span><br></blockquote><blockquote type="cite"><span> IntStream.range(0, 10_000_000)</span><br></blockquote><blockquote type="cite"><span> .parallel()</span><br></blockquote><blockquote type="cite"><span> .forEach(i -> {</span><br></blockquote><blockquote type="cite"><span> int value = pop.applyAsInt(collection);</span><br></blockquote><blockquote type="cite"><span> push.accept(collection, value);</span><br></blockquote><blockquote type="cite"><span> });</span><br></blockquote><blockquote type="cite"><span> } finally {</span><br></blockquote><blockquote type="cite"><span> time = System.nanoTime() - time;</span><br></blockquote><blockquote type="cite"><span> System.out.printf(" time = %dms%n", (time / 1_000_000));</span><br></blockquote><blockquote type="cite"><span> }</span><br></blockquote><blockquote type="cite"><span> List<Integer> finalValues = collection.stream().sorted().toList();</span><br></blockquote><blockquote type="cite"><span> if (!expectedFinalValues.equals(finalValues))</span><br></blockquote><blockquote type="cite"><span> throw new AssertionError();</span><br></blockquote><blockquote type="cite"><span> }</span><br></blockquote><blockquote type="cite"><span>}</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>For example</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Stack time = 1196ms</span><br></blockquote><blockquote type="cite"><span>ConcurrentLinkedDeque time = 2306ms</span><br></blockquote><blockquote type="cite"><span>ConcurrentLinkedDeque time = 1539ms</span><br></blockquote><blockquote type="cite"><span>ConcurrentLinkedQueue time = 1394ms</span><br></blockquote><blockquote type="cite"><span>LinkedBlockingDeque time = 850ms</span><br></blockquote><blockquote type="cite"><span>LinkedBlockingQueue time = 1258ms</span><br></blockquote><blockquote type="cite"><span>ArrayBlockingQueue time = 784ms</span><br></blockquote><blockquote type="cite"><span>LinkedTransferQueue time = 1161ms</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>However, when I run it sequentially, then ConcurrentLinkedQueue wins the race:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Stack time = 336ms</span><br></blockquote><blockquote type="cite"><span>ConcurrentLinkedDeque time = 532ms</span><br></blockquote><blockquote type="cite"><span>ConcurrentLinkedDeque time = 392ms</span><br></blockquote><blockquote type="cite"><span>ConcurrentLinkedQueue time = 281ms</span><br></blockquote><blockquote type="cite"><span>LinkedBlockingDeque time = 413ms</span><br></blockquote><blockquote type="cite"><span>LinkedBlockingQueue time = 512ms</span><br></blockquote><blockquote type="cite"><span>ArrayBlockingQueue time = 388ms</span><br></blockquote><blockquote type="cite"><span>LinkedTransferQueue time = 299ms</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>(Don't take the results too seriously :-))</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Regards</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Heinz</span><br></blockquote><blockquote type="cite"><span>-- </span><br></blockquote><blockquote type="cite"><span>Dr Heinz M. Kabutz (PhD CompSci)</span><br></blockquote><blockquote type="cite"><span>Author of "The Java™ Specialists' Newsletter" - https://urldefense.com/v3/__http://www.javaspecialists.eu__;!!ACWV5N9M2RV99hQ!Jw3zTt3yvffAodYxjAJcopjXxdRk-O-_2QQQCuglOST_KS9jHq2Y0Z4_H_r_ELc09774mXhkdgpDALTDcHpd9mI$ Java Champion - https://urldefense.com/v3/__http://www.javachampions.org__;!!ACWV5N9M2RV99hQ!Jw3zTt3yvffAodYxjAJcopjXxdRk-O-_2QQQCuglOST_KS9jHq2Y0Z4_H_r_ELc09774mXhkdgpDALTDgkemNnM$ JavaOne Rock Star Speaker</span><br></blockquote><blockquote type="cite"><span>Tel: +30 69 75 595 262</span><br></blockquote><blockquote type="cite"><span>Skype: kabutz</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>On 2023/02/21 01:33, Ron Pressler wrote:</span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>Hi.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>The method to disable thread locals has been a source of confusion, and we’re likely to remove it. It was never intended as some mode libraries must support, but to enforce some very special situations — never mind, it’s been consistently misunderstood as an ordinary mode that needs to be supported, and so it is likely going away.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>However, given that if virtual threads are present at all you can assume there’s a very large number of them (as that’s why they’re used) — tens of thousands *at least* — you should ask yourself whether an individual buffer for each thread is really what you want. A small pool of buffers, similar in number to the number of cores — ~1000x smaller than the number of threads — might be a better way to go. You can start with a ConcurrentLinkedQueue to store the buffers, and have threads take and return buffers to that queue. If contention is a noticeable problem, you can do something more sophisticated with an array that is randomly accessed in some way and entries are CASed in and out.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>— Ron</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>On 20 Feb 2023, at 23:15, Carl M <java@rkive.org> wrote:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>While testing out Virtual Threads with project Loom, I encountered some challenges that I was hoping this mailing list could provide guidance on.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> I have a tracing library that uses ThreadLocals for recording events and timing info. The concurrency is structured so that each thread is the sole writer to it's own trace buffer, but separate threads can come in and read that data asynchronously. I am using ThreadLocals to avoid contention between multiple tracing threads. Secondarily, I depend on threads exiting for automatic clean up of the trace data per thread.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> Virtual threads present a hard to overcome challenge, because I can't find a way to tell if ThreadLocals are supported. One of the value propsitions of my library is that it has a consistent and low overhead. Specifically, calling ThreadLocal.set() throws an UnsupportedOperationException in the event that they are not allowed. In the case of using Virtual threads, the likelihood of this happening is much higher, since users are now able to create threads cheaply. I have explored several work-arounds, but not being able to tell is one I can't seem to cleanly overcome. Some ideas that did not pan out:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> * Use a ConcurrentHashMap to implement my own "threadlocal" like solution. Two problems come up: 1. It's easy to accidentally keep the thread alive, and 2. When Thread Locals are supported, my library doesn't get the speedup from them.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> * Use an AtomicReferenceArray and hash into a fixed size of buckets. This avoids using the Thread as a Key, and pays a minor cost of synchronizing on the bucket for recording trace data. In effect it's a poor man's ThreadLocal. However, If I get unlucky there will be contention on a bucket that doesn't naturally shard itself like CHM does.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> * Do Nothing. This causes callers to allocate a ton of memory since the ThreadLocal.initialValue() gets called a ton, leading to unpredictable tracer overhead. There is a small but noticeable amount of overhead for creating the initial value (like registering with the reader) so this ends up not being practial.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> * A Hybrid of ThreadLocal when supported and fallback to CHM or ARA as mentioned above. This is the solution I came up with, where my ThreadLocal calls get() but has no initialValue() override. If the value is null, I attempt to set it. If there is an exception, I write the value to the CHM/ARA and then check there first for future get() calls. The problem with this is that the exception from set() causes an unacceptable amount of overhead for something that should have been very cheap. It isn't sufficient to check if the thread is virtual to see if TLs are supported, so I can't check the class name of the thread apriori. And, since multiple types of threads are calling into my library, I can't require callers to use TLs.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> I'm kind of at a loss as to how to efficiently fallback to a slower implementation when TLs aren't supported, since I can't tell if they are or not. (e.g. can't tell if the electric fence is on without touching it). Again, I'd prefer to keep the fast ThreadLocals if they are supported though.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> I'm looking for ideas (or just to register feedback) with this email, and have been otherwise very happy with the progress on project Loom.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> Carl</span><br></blockquote></blockquote></blockquote><span></span><br></div></blockquote></body></html>