(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
David Holmes
david.holmes at oracle.com
Fri Nov 6 06:26:44 UTC 2015
Hi Jeremy,
Okay I have read:
https://sourceware.org/glibc/wiki/TLSandSignals
and the tree of mail postings referenced therefrom - great reading! :)
So basic problem: access to __thread variables is not async-signal-safe
Exacerbator to problem: first attempt to even read a __thread variable
can lead to allocation which is the real problem in relation to
async-signal-safety
I mention the exacerbator because pthread_getspecific and
pthread_setSpecific are also not async-signal-safe but we already use
them. However, pthread_getspecific is in fact (per email threads linked
above) effectively async-signal-safe, and further a call to
pthread_getspecific never results in a call to pthread_setspecific or an
allocation. Hence the pthread functions are almost, if not completely,
safe in practice with reasonable uses (ie only read from signal
handler). Which explain this code in existing Thread::current()
#ifdef PARANOID
// Signal handler should call ThreadLocalStorage::get_thread_slow()
Thread* t = ThreadLocalStorage::get_thread_slow();
assert(t != NULL && !t->is_inside_signal_handler(),
"Don't use Thread::current() inside signal handler");
#endif
So problem scenario is: use of __thread variable (that belongs to the
shared-library) in a signal handler.
Solution 0: don't do that. Seriously - like any other
async-signal-unsafe stuff we should not be using it in real signal
handlers. The crash handler is a different matter - we try all sorts
there because it might work and you can't die twice.
Otherwise: narrow the window of exposure.
1. We ensure we initialize thread_current (even with a dummy value) as
early as possible in the thread that loads libjvm. As we have no signal
handlers installed at that point that might use the same variable, we
can not hit the problem scenario.
2. We ensure we initialize thread_current in a new thread with all
signals blocked. This again avoids the problem scenario.
3. We initialize thread_current in an attaching thread as soon as
possible and we again first block all signals.
That still leaves the problem of an unattached native thread taking a
signal whilst in async-signal-unsafe code, and executing a signal
handler which in turns tries to access thread_current for the first
time. This signal handler need not be an actual JVM handler, but one
attached by other native code eg an agent. I'm not clear in the latter
case how reasonable it is for an agent's handler to try and do things
from an unattached thread - and we don't claim any JNI interfaces can,
or should, be called from a signal handler - but it is something you can
probably get away with today.
Let me also point out that we already effectively have this code in
Solaris already (at the ThreadLocalStorage class level). So if there is
something here that will prevent the current proposal we already have a
problem on Solaris. :(
Thoughts/comments/suggestions?
Thanks,
David
On 6/11/2015 1:09 PM, David Holmes wrote:
> Hi Jeremy,
>
> I was going to ask you to elaborate :)
>
> On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>> I should probably elaborate on this. With glibc + ELF, the first time a
>> thread accesses a variable declared __thread, if that variable is in a
>> shared library (as opposed to the main executable), the system calls
>> malloc() to allocate the space for it. If that happens in a signal that
>> is being delivered during a call to malloc(), then you usually get a
>> crash.
>
> My understanding of the ELF ABI for thread-locals - which I read about
> in the Solaris 11.1 Linkers and libraries guide - does require use of
> the dynamic TLS model for any dynamically loaded shared object which
> defines a thread-local, but that is what we use as I understand it. The
> docs state:
>
> "A shared object containing only dynamic TLS can be loaded following
> process startup without limitations. The runtime linker extends the list
> of initialization records to include the initialization template of the
> new object. The new object is given an index of m = M + 1. The
> counter M is incremented by 1. However, the allocation of new TLS blocks
> is deferred until the blocks are actually referenced."
>
> Now I guess "extends the list" might be implemented using malloc ... but
> this will only occur in the main thread (the one started by the launcher
> to load the JVM and become the main thread), at the time libjvm is
> loaded - which will all be over before any agent etc can run and do
> anything. But "allocation ... is deferred" suggests we may have a
> problem until either the first call to Thread::current or the call to
> Thread::initialize_thread_current. If it is the former then that should
> occur well before any agent etc can be loaded. And I can easily inject
> an initial dummy call to initialize_thread_current(null) to force the
> TLS allocation.
>
>> This may bite you if AsyncGetCallTrace uses Thread::current(), and you
>> use system timers to do profiling. If a thread doing a malloc() prior
>> to the first time it accesses Thread::current(), and it gets delivered a
>> signal, it might die. This is especially likely for pure native threads
>> started by native code.
>>
>> I believe that this is a use case you support, so you might want to make
>> sure it is okay.
>
> For a VM embedded in a process, which already contains native threads,
> that will later attach to the VM, this may indeed be a problem. One
> would have hoped however that the implementation of TLS would be
> completely robust, at least for something as simple as getting a signal
> whilst in the allocator.
>
> I'm unclear how to test for or check for this kind of problem. Arguably
> there could be many things that are async-unsafe in this way.
>
> Need to think more about this and do some more research. Would also
> appreciate any insight from any glibc and/or ELF gurus.
>
> Thanks.
> David
>
>> Jeremy
>>
>> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson <jeremymanson at google.com
>> <mailto:jeremymanson at google.com>> wrote:
>>
>> Something that's bitten me with __thread: it isn't async-safe when
>> called from a shared object on Linux. Have you vetted to make sure
>> this doesn't make HS less async-safe?
>>
>> Jeremy
>>
>> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>
>> Open webrev:
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>>
>> A simple (in principle) but wide-ranging change which should
>> appeal to our Code Deletion Engineer's. We implement
>> Thread::current() using a compiler/language-based thread-local
>> variable eg:
>>
>>
>> static __thread Thread *_thr_current;
>>
>> inline Thread* Thread::current() {
>> return _thr_current;
>> }
>>
>> with an appropriate setter of course. By doing this we can
>> completely remove the platform-specific ThreadLocalStorage
>> implementations, and the associated os::thread_local_storage*
>> calls, plus all the uses of ThreadLocalStorage::thread() and
>> ThreadLocalStorage::get_thread_slow(). This extends the previous
>> work done on Solaris to implement ThreadLocalStorage::thread()
>> using compiler-based thread-locals.
>>
>> We can also consolidate nearly all the os_cpu versions of
>> MacroAssembler::get_thread on x86 into one cpu specific one ( a
>> special variant is still needed for 32-bit Windows).
>>
>> As a result of this change we have further potential cleanups:
>> - all the src/os/<os>/vm/thread_<os>.inline.hpp files are now
>> completely empty and could also be removed
>> - the MINIMIZE_RAM_USAGE define (which avoids use of the linux
>> sp-map "cache" on 32-bit) now has no affect and so could be
>> completely removed from the build system
>>
>> I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR,
>> but could add the removal of the "inline" files to this CR if
>> people think it worth removing them.
>>
>> I have one missing piece on Aarch64 - I need to change
>> MacroAssembler::get_thread to simply call Thread::current() as
>> on other platforms, but I don't know how to write that. I would
>> appreciate it if someone could give me the right code for that.
>>
>> I would also appreciate comments/testing by the AIX and PPC64
>> folk as well.
>>
>> A concern about memory-leaks had previously been raised, but
>> experiments using simple C code on linux 86 and Solaris showed
>> no issues. Also note that Aarch64 already uses this kind of
>> thread-local.
>>
>> Thanks,
>> David
>>
>>
>>
More information about the porters-dev
mailing list