Detecting Thread Local Support

Tue Feb 21 01:01:29 UTC 2023

Thanks for the response, I didn't realize it was only for a limited set of use cases.  From my POV, it definitely did seem like something library maintainers would need to support.

As for the number of threads, I haven't quite figured it out.  The library was originally written with the expectation of around 2000 threads, each with maybe 32K entries.   With Loom, those assumptions from years ago no longer hold.  I also came to a ConcurrentLinked(Deque*) solution, which I used as a stack of trace buffers.  The assumption was to shard by Thread, and then assume the top most trace was the one likely to be written to.  That said, it still ends up being slightly more expensive.   When recording the trace, The Thread ID (or WeakReference<Thread>) has to be written with every trace call, since the storage is shared.  With the thread local, this is unnecessary.   Sadly, my tracing library has slowly be en increasing from ~4ns per trace event to around 6-7ns.

Is there documentation about what points a virtual thread can be rescheduled onto a different Carrier Thread?  For my own personal preference, I would still prefer to trace per Platform Thread, and indicate when a migration happens, rather than create 10K+ thread traces.  I'm guessing there are some ramifications to exposing both the platform thread and virtual thread locals separately, but my initial guess is that it'd be useful.

Carl

> On 02/20/2023 4:33 PM PST Ron Pressler <ron.pressler at oracle.com> wrote:
> 
> 
> Hi.
> 
> The method to disable thread locals has been a source of confusion, and we’re likely to remove it. It was never intended as some mode libraries must support, but to enforce some very special situations — never mind, it’s been consistently misunderstood as an ordinary mode that needs to be supported, and so it is likely going away.
> 
> However, given that if virtual threads are present at all you can assume there’s a very large number of them (as that’s why they’re used) — tens of thousands *at least* — you should ask yourself whether an individual buffer for each thread is really what you want. A small pool of buffers, similar in number to the number of cores — ~1000x smaller than the number of threads — might be a better way to go. You can start with a ConcurrentLinkedQueue to store the buffers, and have threads take and return buffers to that queue. If contention is a noticeable problem, you can do something more sophisticated with an array that is randomly accessed in some way and entries are CASed in and out.
> 
> — Ron
> 
> 
> > On 20 Feb 2023, at 23:15, Carl M <java at rkive.org> wrote:
> > 
> > While testing out Virtual Threads with project Loom, I encountered some challenges that I was hoping this mailing list could provide guidance on.
> > 
> > I have a tracing library that uses ThreadLocals for recording events and timing info. The concurrency is structured so that each thread is the sole writer to it's own trace buffer, but separate threads can come in and read that data asynchronously. I am using ThreadLocals to avoid contention between multiple tracing threads. Secondarily, I depend on threads exiting for automatic clean up of the trace data per thread.
> > 
> > Virtual threads present a hard to overcome challenge, because I can't find a way to tell if ThreadLocals are supported. One of the value propsitions of my library is that it has a consistent and low overhead. Specifically, calling ThreadLocal.set() throws an UnsupportedOperationException in the event that they are not allowed. In the case of using Virtual threads, the likelihood of this happening is much higher, since users are now able to create threads cheaply. I have explored several work-arounds, but not being able to tell is one I can't seem to cleanly overcome. Some ideas that did not pan out:
> > 
> > * Use a ConcurrentHashMap to implement my own "threadlocal" like solution. Two problems come up: 1. It's easy to accidentally keep the thread alive, and 2. When Thread Locals are supported, my library doesn't get the speedup from them.
> > 
> > * Use an AtomicReferenceArray and hash into a fixed size of buckets. This avoids using the Thread as a Key, and pays a minor cost of synchronizing on the bucket for recording trace data. In effect it's a poor man's ThreadLocal. However, If I get unlucky there will be contention on a bucket that doesn't naturally shard itself like CHM does.
> > 
> > * Do Nothing. This causes callers to allocate a ton of memory since the ThreadLocal.initialValue() gets called a ton, leading to unpredictable tracer overhead. There is a small but noticeable amount of overhead for creating the initial value (like registering with the reader) so this ends up not being practial.
> > 
> > * A Hybrid of ThreadLocal when supported and fallback to CHM or ARA as mentioned above. This is the solution I came up with, where my ThreadLocal calls get() but has no initialValue() override. If the value is null, I attempt to set it. If there is an exception, I write the value to the CHM/ARA and then check there first for future get() calls. The problem with this is that the exception from set() causes an unacceptable amount of overhead for something that should have been very cheap. It isn't sufficient to check if the thread is virtual to see if TLs are supported, so I can't check the class name of the thread apriori. And, since multiple types of threads are calling into my library, I can't require callers to use TLs.
> > 
> > 
> > I'm kind of at a loss as to how to efficiently fallback to a slower implementation when TLs aren't supported, since I can't tell if they are or not. (e.g. can't tell if the electric fence is on without touching it). Again, I'd prefer to keep the fast ThreadLocals if they are supported though.
> > 
> > 
> > I'm looking for ideas (or just to register feedback) with this email, and have been otherwise very happy with the progress on project Loom.
> > 
> > Carl