(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

David Holmes david.holmes at oracle.com
Sun Nov 8 22:54:18 UTC 2015


On 7/11/2015 11:22 AM, Jeremy Manson wrote:
>
>
> On Thu, Nov 5, 2015 at 11:48 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     On 6/11/2015 4:55 PM, Jeremy Manson wrote:
>
>         FWIW, Google tried to convince the glibc maintainers to make this
>         async-safe, but they weren't biting:
>
>         https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html
>
>
>     Yes I read all that. I wouldn't say they weren't biting, more of a
>     disagreement on the right direction for the patch. glibc weren't
>     prepared to take it directly as is, while google-folk didn't seem to
>     think it worth their while to do whatever glibc folk wanted. The
>     actual patch proposal just died out. :( Quite a pity.
>
>         Most of the things you can do are going to be mitigation rather
>         than a
>         fix.  I did what you suggest to mitigate, and no one complained,
>         until
>         someone at Google started a sidecar C++ thread that did a
>         boatload of
>         malloc'ing.
>
>
>     Yes all mitigation. :(
>
>         My workaround was far sneakier, and fixes the problem entirely,
>         but you
>         probably won't want to do it for code hygiene reasons.  I
>         declare the
>         __thread variable in the java launcher itself, and then export a
>         function that provides a pointer to it.  In practice, in glibc,
>         if it is
>         in the main executable, ELF is smart enough to declare it as
>         part of the
>         stack, and is therefore async-safe.
>
>
>     But even that only works for our own launchers - not for embedded in
>     the JVM. :(
>
>
> Yup.  Fortunately, I can tell people at Google how to write launchers.
>
>         Fortunately, while this is a fun thing to think about, I don't think
>         there are any async paths in the JVM that use Thread::current()
>         (although I could be wrong - there is a comment in there saying
>         not to
>         call Thread::current() in a signal handler...).  I would check
>         the call
>         paths in AsyncGetCallTrace to make sure.
>
>
>     So two things ...
>
>     First, using Thread::current() in a signal context was disallowed,
>     but the alternative was ThreadLocalStorage::get_thread_slow(). The
>     former may not work in a signal context due to the caching
>     mechanisms layered in on different platforms, while the latter used
>     the platform TLS API which, even if not guaranteed, worked well
>     enough in a signal context. With __thread we don't have even a
>     pretend signal-safe alternative :(
>
>
> Right.
>
>     Second, AsyncGetCallTrace violates the first rule by using
>     JavaThread::current() in an assertion.
>
>
> While we're on the subject, the assertion in Method::bci_from is
> reachable from AsyncGetCallTrace and calls err_msg, which can malloc().
> I meant to file a bug about that.
>
>     Also, the problem may not be limited to something like
>     AsyncGetCallTrace. Though agents may get the current thread from the
>     JNIEnv rather than invoking some JVM function that results in
>     Thread::current() being used, I can't be sure of that.
>
>
> Which JVM functions that get the thread are supposed to be async-safe?
> There is no reason to think that any method that isn't explicitly marked
> async-safe is async safe, and most JNI methods I've tried to use from
> signal handlers die painfully if I try to use them from a signal handler.
>
> Generally, I don't think it is reasonable for a user to expect
> async-safety from an API that isn't expressly designed that way.  POSIX
> has a list of async-safe methods (signal(7)).

Right - no JNI or JVM TI functions are designated as async-signal-safe 
(the specs dont even mention signals).

Unfortunately my problem is more basic: pretty much the first thing the 
JVM signal handler does is get the current thread. So if the signal is 
handled on an unattached thread that happened to be doing a malloc then 
we're toast. :( Most of the signals the JVM expects to handle are not 
blocked by default, AFAICS, so any unattached thread could be selected.

David
-----


> FWIW, to use AsyncGetCallTrace, I get the JNIEnv in a ThreadStart hook
> from JVMTI and stash it in a __thread (and pull the trick I mentioned).
> Jeremy
>
>
>     Anyway more things to mull over on the weekedn. :)
>
>     Thanks,
>     David
>
>         Jeremy
>
>         On Thu, Nov 5, 2015 at 10:26 PM, David Holmes
>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>         <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>> wrote:
>
>              Hi Jeremy,
>
>              Okay I have read:
>
>         https://sourceware.org/glibc/wiki/TLSandSignals
>
>              and the tree of mail postings referenced therefrom - great
>         reading! :)
>
>              So basic problem: access to __thread variables is not
>         async-signal-safe
>
>              Exacerbator to problem: first attempt to even read a __thread
>              variable can lead to allocation which is the real problem in
>              relation to async-signal-safety
>
>              I mention the exacerbator because pthread_getspecific and
>              pthread_setSpecific are also not async-signal-safe but we
>         already
>              use them. However, pthread_getspecific is in fact (per
>         email threads
>              linked above) effectively async-signal-safe, and further a
>         call to
>              pthread_getspecific never results in a call to
>         pthread_setspecific
>              or an allocation. Hence the pthread functions are almost,
>         if not
>              completely, safe in practice with reasonable uses (ie only
>         read from
>              signal handler). Which explain this code in existing
>         Thread::current()
>
>              #ifdef PARANOID
>                 // Signal handler should call
>         ThreadLocalStorage::get_thread_slow()
>                 Thread* t = ThreadLocalStorage::get_thread_slow();
>                 assert(t != NULL && !t->is_inside_signal_handler(),
>                        "Don't use Thread::current() inside signal handler");
>              #endif
>
>              So problem scenario is: use of __thread variable (that
>         belongs to
>              the shared-library) in a signal handler.
>
>              Solution 0: don't do that. Seriously - like any other
>              async-signal-unsafe stuff we should not be using it in real
>         signal
>              handlers. The crash handler is a different matter - we try
>         all sorts
>              there because it might work and you can't die twice.
>
>              Otherwise: narrow the window of exposure.
>
>              1. We ensure we initialize thread_current (even with a
>         dummy value)
>              as early as possible in the thread that loads libjvm. As we
>         have no
>              signal handlers installed at that point that might use the same
>              variable, we can not hit the problem scenario.
>
>              2. We ensure we initialize thread_current in a new thread
>         with all
>              signals blocked. This again avoids the problem scenario.
>
>              3. We initialize thread_current in an attaching thread as
>         soon as
>              possible and we again first block all signals.
>
>              That still leaves the problem of an unattached native
>         thread taking
>              a signal whilst in async-signal-unsafe code, and executing
>         a signal
>              handler which in turns tries to access thread_current for
>         the first
>              time. This signal handler need not be an actual JVM
>         handler, but one
>              attached by other native code eg an agent. I'm not clear in the
>              latter case how reasonable it is for an agent's handler to
>         try and
>              do things from an unattached thread - and we don't claim
>         any JNI
>              interfaces can, or should, be called from a signal handler
>         - but it
>              is something you can probably get away with today.
>
>              Let me also point out that we already effectively have this
>         code in
>              Solaris already (at the ThreadLocalStorage class level). So
>         if there
>              is something here that will prevent the current proposal we
>         already
>              have a problem on Solaris. :(
>
>              Thoughts/comments/suggestions?
>
>              Thanks,
>              David
>
>
>              On 6/11/2015 1:09 PM, David Holmes wrote:
>
>                  Hi Jeremy,
>
>                  I was going to ask you to elaborate :)
>
>                  On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>
>                      I should probably elaborate on this.  With glibc +
>         ELF, the
>                      first time a
>                      thread accesses a variable declared __thread, if that
>                      variable is in a
>                      shared library (as opposed to the main executable), the
>                      system calls
>                      malloc() to allocate the space for it.  If that
>         happens in a
>                      signal that
>                      is being delivered during a call to malloc(), then you
>                      usually get a
>                      crash.
>
>
>                  My understanding of the ELF ABI for thread-locals -
>         which I read
>                  about
>                  in the Solaris 11.1 Linkers and libraries guide - does
>         require
>                  use of
>                  the dynamic TLS model for any dynamically loaded shared
>         object which
>                  defines a thread-local, but that is what we use as I
>         understand
>                  it. The
>                  docs state:
>
>                  "A shared object containing only dynamic TLS can be
>         loaded following
>                  process startup without limitations. The runtime linker
>         extends
>                  the list
>                  of initialization records to include the initialization
>         template
>                  of the
>                  new object. The new object is given an index of m = M +
>         1. The
>                  counter M is incremented by 1. However, the allocation
>         of new
>                  TLS blocks
>                  is deferred until the blocks are actually referenced."
>
>                  Now I guess "extends the list" might be implemented
>         using malloc
>                  ... but
>                  this will only occur in the main thread (the one
>         started by the
>                  launcher
>                  to load the JVM and become the main thread), at the
>         time libjvm is
>                  loaded - which will all be over before any agent etc
>         can run and do
>                  anything. But "allocation ... is deferred" suggests we
>         may have a
>                  problem until either the first call to Thread::current
>         or the
>                  call to
>                  Thread::initialize_thread_current. If it is the former
>         then that
>                  should
>                  occur well before any agent etc can be loaded. And I
>         can easily
>                  inject
>                  an initial dummy call to initialize_thread_current(null) to
>                  force the
>                  TLS allocation.
>
>                      This may bite you if AsyncGetCallTrace uses
>                      Thread::current(), and you
>                      use system timers to do profiling.  If a thread doing a
>                      malloc() prior
>                      to the first time it accesses Thread::current(),
>         and it gets
>                      delivered a
>                      signal, it might die.  This is especially likely
>         for pure
>                      native threads
>                      started by native code.
>
>                      I believe that this is a use case you support, so
>         you might
>                      want to make
>                      sure it is okay.
>
>
>                  For a VM embedded in a process, which already contains
>         native
>                  threads,
>                  that will later attach to the VM, this may indeed be a
>         problem. One
>                  would have hoped however that the implementation of TLS
>         would be
>                  completely robust, at least for something as simple as
>         getting a
>                  signal
>                  whilst in the allocator.
>
>                  I'm unclear how to test for or check for this kind of
>         problem.
>                  Arguably
>                  there could be many things that are async-unsafe in
>         this way.
>
>                  Need to think more about this and do some more
>         research. Would also
>                  appreciate any insight from any glibc and/or ELF gurus.
>
>                  Thanks.
>                  David
>
>                      Jeremy
>
>                      On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
>                      <jeremymanson at google.com
>         <mailto:jeremymanson at google.com> <mailto:jeremymanson at google.com
>         <mailto:jeremymanson at google.com>>
>                      <mailto:jeremymanson at google.com
>         <mailto:jeremymanson at google.com>
>                      <mailto:jeremymanson at google.com
>         <mailto:jeremymanson at google.com>>>> wrote:
>
>                           Something that's bitten me with __thread: it isn't
>                      async-safe when
>                           called from a shared object on Linux.  Have
>         you vetted
>                      to make sure
>                           this doesn't make HS less async-safe?
>
>                           Jeremy
>
>                           On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
>                           <david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>                      <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>
>                      <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>
>                      <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>>> wrote:
>
>                               bug:
>         https://bugs.openjdk.java.net/browse/JDK-8132510
>
>                               Open webrev:
>         http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>
>                               A simple (in principle) but wide-ranging
>         change
>                      which should
>                               appeal to our Code Deletion Engineer's. We
>         implement
>                               Thread::current() using a
>         compiler/language-based
>                      thread-local
>                               variable eg:
>
>
>                                 static __thread Thread *_thr_current;
>
>                                 inline Thread* Thread::current() {
>                                   return _thr_current;
>                                 }
>
>                               with an appropriate setter of course. By
>         doing this
>                      we can
>                               completely remove the platform-specific
>                      ThreadLocalStorage
>                               implementations, and the associated
>                      os::thread_local_storage*
>                               calls, plus all the uses of
>                      ThreadLocalStorage::thread() and
>                               ThreadLocalStorage::get_thread_slow().
>         This extends
>                      the previous
>                               work done on Solaris to implement
>                      ThreadLocalStorage::thread()
>                               using compiler-based thread-locals.
>
>                               We can also consolidate nearly all the os_cpu
>                      versions of
>                               MacroAssembler::get_thread on x86 into one cpu
>                      specific one ( a
>                               special variant is still needed for 32-bit
>         Windows).
>
>                               As a result of this change we have further
>                      potential cleanups:
>                               - all the
>         src/os/<os>/vm/thread_<os>.inline.hpp
>                      files are now
>                               completely empty and could also be removed
>                               - the MINIMIZE_RAM_USAGE define (which
>         avoids use
>                      of the linux
>                               sp-map "cache" on 32-bit) now has no
>         affect and so
>                      could be
>                               completely removed from the build system
>
>                               I plan to do the MINIMIZE_RAM_USAGE
>         removal as a
>                      follow up CR,
>                               but could add the removal of the "inline"
>         files to
>                      this CR if
>                               people think it worth removing them.
>
>                               I have one missing piece on Aarch64 - I
>         need to change
>                               MacroAssembler::get_thread to simply call
>                      Thread::current() as
>                               on other platforms, but I don't know how
>         to write
>                      that. I would
>                               appreciate it if someone could give me the
>         right
>                      code for that.
>
>                               I would also appreciate comments/testing
>         by the AIX
>                      and PPC64
>                               folk as well.
>
>                               A concern about memory-leaks had
>         previously been
>                      raised, but
>                               experiments using simple C code on linux
>         86 and
>                      Solaris showed
>                               no issues. Also note that Aarch64 already
>         uses this
>                      kind of
>                               thread-local.
>
>                               Thanks,
>                               David
>
>
>
>
>


More information about the porters-dev mailing list