(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Fri Nov 6 07:48:43 UTC 2015

On 6/11/2015 4:55 PM, Jeremy Manson wrote:
> FWIW, Google tried to convince the glibc maintainers to make this
> async-safe, but they weren't biting:
>
> https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html

Yes I read all that. I wouldn't say they weren't biting, more of a 
disagreement on the right direction for the patch. glibc weren't 
prepared to take it directly as is, while google-folk didn't seem to 
think it worth their while to do whatever glibc folk wanted. The actual 
patch proposal just died out. :( Quite a pity.

> Most of the things you can do are going to be mitigation rather than a
> fix.  I did what you suggest to mitigate, and no one complained, until
> someone at Google started a sidecar C++ thread that did a boatload of
> malloc'ing.

Yes all mitigation. :(

> My workaround was far sneakier, and fixes the problem entirely, but you
> probably won't want to do it for code hygiene reasons.  I declare the
> __thread variable in the java launcher itself, and then export a
> function that provides a pointer to it.  In practice, in glibc, if it is
> in the main executable, ELF is smart enough to declare it as part of the
> stack, and is therefore async-safe.

But even that only works for our own launchers - not for embedded in the 
JVM. :(

> Fortunately, while this is a fun thing to think about, I don't think
> there are any async paths in the JVM that use Thread::current()
> (although I could be wrong - there is a comment in there saying not to
> call Thread::current() in a signal handler...).  I would check the call
> paths in AsyncGetCallTrace to make sure.

So two things ...

First, using Thread::current() in a signal context was disallowed, but 
the alternative was ThreadLocalStorage::get_thread_slow(). The former 
may not work in a signal context due to the caching mechanisms layered 
in on different platforms, while the latter used the platform TLS API 
which, even if not guaranteed, worked well enough in a signal context. 
With __thread we don't have even a pretend signal-safe alternative :(

Second, AsyncGetCallTrace violates the first rule by using 
JavaThread::current() in an assertion.

Also, the problem may not be limited to something like 
AsyncGetCallTrace. Though agents may get the current thread from the 
JNIEnv rather than invoking some JVM function that results in 
Thread::current() being used, I can't be sure of that.

Anyway more things to mull over on the weekedn. :)

Thanks,
David

> Jeremy
>
> On Thu, Nov 5, 2015 at 10:26 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Jeremy,
>
>     Okay I have read:
>
>     https://sourceware.org/glibc/wiki/TLSandSignals
>
>     and the tree of mail postings referenced therefrom - great reading! :)
>
>     So basic problem: access to __thread variables is not async-signal-safe
>
>     Exacerbator to problem: first attempt to even read a __thread
>     variable can lead to allocation which is the real problem in
>     relation to async-signal-safety
>
>     I mention the exacerbator because pthread_getspecific and
>     pthread_setSpecific are also not async-signal-safe but we already
>     use them. However, pthread_getspecific is in fact (per email threads
>     linked above) effectively async-signal-safe, and further a call to
>     pthread_getspecific never results in a call to pthread_setspecific
>     or an allocation. Hence the pthread functions are almost, if not
>     completely, safe in practice with reasonable uses (ie only read from
>     signal handler). Which explain this code in existing Thread::current()
>
>     #ifdef PARANOID
>        // Signal handler should call ThreadLocalStorage::get_thread_slow()
>        Thread* t = ThreadLocalStorage::get_thread_slow();
>        assert(t != NULL && !t->is_inside_signal_handler(),
>               "Don't use Thread::current() inside signal handler");
>     #endif
>
>     So problem scenario is: use of __thread variable (that belongs to
>     the shared-library) in a signal handler.
>
>     Solution 0: don't do that. Seriously - like any other
>     async-signal-unsafe stuff we should not be using it in real signal
>     handlers. The crash handler is a different matter - we try all sorts
>     there because it might work and you can't die twice.
>
>     Otherwise: narrow the window of exposure.
>
>     1. We ensure we initialize thread_current (even with a dummy value)
>     as early as possible in the thread that loads libjvm. As we have no
>     signal handlers installed at that point that might use the same
>     variable, we can not hit the problem scenario.
>
>     2. We ensure we initialize thread_current in a new thread with all
>     signals blocked. This again avoids the problem scenario.
>
>     3. We initialize thread_current in an attaching thread as soon as
>     possible and we again first block all signals.
>
>     That still leaves the problem of an unattached native thread taking
>     a signal whilst in async-signal-unsafe code, and executing a signal
>     handler which in turns tries to access thread_current for the first
>     time. This signal handler need not be an actual JVM handler, but one
>     attached by other native code eg an agent. I'm not clear in the
>     latter case how reasonable it is for an agent's handler to try and
>     do things from an unattached thread - and we don't claim any JNI
>     interfaces can, or should, be called from a signal handler - but it
>     is something you can probably get away with today.
>
>     Let me also point out that we already effectively have this code in
>     Solaris already (at the ThreadLocalStorage class level). So if there
>     is something here that will prevent the current proposal we already
>     have a problem on Solaris. :(
>
>     Thoughts/comments/suggestions?
>
>     Thanks,
>     David
>
>
>     On 6/11/2015 1:09 PM, David Holmes wrote:
>
>         Hi Jeremy,
>
>         I was going to ask you to elaborate :)
>
>         On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>
>             I should probably elaborate on this.  With glibc + ELF, the
>             first time a
>             thread accesses a variable declared __thread, if that
>             variable is in a
>             shared library (as opposed to the main executable), the
>             system calls
>             malloc() to allocate the space for it.  If that happens in a
>             signal that
>             is being delivered during a call to malloc(), then you
>             usually get a
>             crash.
>
>
>         My understanding of the ELF ABI for thread-locals - which I read
>         about
>         in the Solaris 11.1 Linkers and libraries guide - does require
>         use of
>         the dynamic TLS model for any dynamically loaded shared object which
>         defines a thread-local, but that is what we use as I understand
>         it. The
>         docs state:
>
>         "A shared object containing only dynamic TLS can be loaded following
>         process startup without limitations. The runtime linker extends
>         the list
>         of initialization records to include the initialization template
>         of the
>         new object. The new object is given an index of m = M + 1. The
>         counter M is incremented by 1. However, the allocation of new
>         TLS blocks
>         is deferred until the blocks are actually referenced."
>
>         Now I guess "extends the list" might be implemented using malloc
>         ... but
>         this will only occur in the main thread (the one started by the
>         launcher
>         to load the JVM and become the main thread), at the time libjvm is
>         loaded - which will all be over before any agent etc can run and do
>         anything. But "allocation ... is deferred" suggests we may have a
>         problem until either the first call to Thread::current or the
>         call to
>         Thread::initialize_thread_current. If it is the former then that
>         should
>         occur well before any agent etc can be loaded. And I can easily
>         inject
>         an initial dummy call to initialize_thread_current(null) to
>         force the
>         TLS allocation.
>
>             This may bite you if AsyncGetCallTrace uses
>             Thread::current(), and you
>             use system timers to do profiling.  If a thread doing a
>             malloc() prior
>             to the first time it accesses Thread::current(), and it gets
>             delivered a
>             signal, it might die.  This is especially likely for pure
>             native threads
>             started by native code.
>
>             I believe that this is a use case you support, so you might
>             want to make
>             sure it is okay.
>
>
>         For a VM embedded in a process, which already contains native
>         threads,
>         that will later attach to the VM, this may indeed be a problem. One
>         would have hoped however that the implementation of TLS would be
>         completely robust, at least for something as simple as getting a
>         signal
>         whilst in the allocator.
>
>         I'm unclear how to test for or check for this kind of problem.
>         Arguably
>         there could be many things that are async-unsafe in this way.
>
>         Need to think more about this and do some more research. Would also
>         appreciate any insight from any glibc and/or ELF gurus.
>
>         Thanks.
>         David
>
>             Jeremy
>
>             On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
>             <jeremymanson at google.com <mailto:jeremymanson at google.com>
>             <mailto:jeremymanson at google.com
>             <mailto:jeremymanson at google.com>>> wrote:
>
>                  Something that's bitten me with __thread: it isn't
>             async-safe when
>                  called from a shared object on Linux.  Have you vetted
>             to make sure
>                  this doesn't make HS less async-safe?
>
>                  Jeremy
>
>                  On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
>                  <david.holmes at oracle.com
>             <mailto:david.holmes at oracle.com>
>             <mailto:david.holmes at oracle.com
>             <mailto:david.holmes at oracle.com>>> wrote:
>
>                      bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>
>                      Open webrev:
>             http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>
>                      A simple (in principle) but wide-ranging change
>             which should
>                      appeal to our Code Deletion Engineer's. We implement
>                      Thread::current() using a compiler/language-based
>             thread-local
>                      variable eg:
>
>
>                        static __thread Thread *_thr_current;
>
>                        inline Thread* Thread::current() {
>                          return _thr_current;
>                        }
>
>                      with an appropriate setter of course. By doing this
>             we can
>                      completely remove the platform-specific
>             ThreadLocalStorage
>                      implementations, and the associated
>             os::thread_local_storage*
>                      calls, plus all the uses of
>             ThreadLocalStorage::thread() and
>                      ThreadLocalStorage::get_thread_slow(). This extends
>             the previous
>                      work done on Solaris to implement
>             ThreadLocalStorage::thread()
>                      using compiler-based thread-locals.
>
>                      We can also consolidate nearly all the os_cpu
>             versions of
>                      MacroAssembler::get_thread on x86 into one cpu
>             specific one ( a
>                      special variant is still needed for 32-bit Windows).
>
>                      As a result of this change we have further
>             potential cleanups:
>                      - all the src/os/<os>/vm/thread_<os>.inline.hpp
>             files are now
>                      completely empty and could also be removed
>                      - the MINIMIZE_RAM_USAGE define (which avoids use
>             of the linux
>                      sp-map "cache" on 32-bit) now has no affect and so
>             could be
>                      completely removed from the build system
>
>                      I plan to do the MINIMIZE_RAM_USAGE removal as a
>             follow up CR,
>                      but could add the removal of the "inline" files to
>             this CR if
>                      people think it worth removing them.
>
>                      I have one missing piece on Aarch64 - I need to change
>                      MacroAssembler::get_thread to simply call
>             Thread::current() as
>                      on other platforms, but I don't know how to write
>             that. I would
>                      appreciate it if someone could give me the right
>             code for that.
>
>                      I would also appreciate comments/testing by the AIX
>             and PPC64
>                      folk as well.
>
>                      A concern about memory-leaks had previously been
>             raised, but
>                      experiments using simple C code on linux 86 and
>             Solaris showed
>                      no issues. Also note that Aarch64 already uses this
>             kind of
>                      thread-local.
>
>                      Thanks,
>                      David
>
>
>
>