(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
David Holmes
david.holmes at oracle.com
Fri Nov 6 07:48:43 UTC 2015
On 6/11/2015 4:55 PM, Jeremy Manson wrote:
> FWIW, Google tried to convince the glibc maintainers to make this
> async-safe, but they weren't biting:
>
> https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html
Yes I read all that. I wouldn't say they weren't biting, more of a
disagreement on the right direction for the patch. glibc weren't
prepared to take it directly as is, while google-folk didn't seem to
think it worth their while to do whatever glibc folk wanted. The actual
patch proposal just died out. :( Quite a pity.
> Most of the things you can do are going to be mitigation rather than a
> fix. I did what you suggest to mitigate, and no one complained, until
> someone at Google started a sidecar C++ thread that did a boatload of
> malloc'ing.
Yes all mitigation. :(
> My workaround was far sneakier, and fixes the problem entirely, but you
> probably won't want to do it for code hygiene reasons. I declare the
> __thread variable in the java launcher itself, and then export a
> function that provides a pointer to it. In practice, in glibc, if it is
> in the main executable, ELF is smart enough to declare it as part of the
> stack, and is therefore async-safe.
But even that only works for our own launchers - not for embedded in the
JVM. :(
> Fortunately, while this is a fun thing to think about, I don't think
> there are any async paths in the JVM that use Thread::current()
> (although I could be wrong - there is a comment in there saying not to
> call Thread::current() in a signal handler...). I would check the call
> paths in AsyncGetCallTrace to make sure.
So two things ...
First, using Thread::current() in a signal context was disallowed, but
the alternative was ThreadLocalStorage::get_thread_slow(). The former
may not work in a signal context due to the caching mechanisms layered
in on different platforms, while the latter used the platform TLS API
which, even if not guaranteed, worked well enough in a signal context.
With __thread we don't have even a pretend signal-safe alternative :(
Second, AsyncGetCallTrace violates the first rule by using
JavaThread::current() in an assertion.
Also, the problem may not be limited to something like
AsyncGetCallTrace. Though agents may get the current thread from the
JNIEnv rather than invoking some JVM function that results in
Thread::current() being used, I can't be sure of that.
Anyway more things to mull over on the weekedn. :)
Thanks,
David
> Jeremy
>
> On Thu, Nov 5, 2015 at 10:26 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> Hi Jeremy,
>
> Okay I have read:
>
> https://sourceware.org/glibc/wiki/TLSandSignals
>
> and the tree of mail postings referenced therefrom - great reading! :)
>
> So basic problem: access to __thread variables is not async-signal-safe
>
> Exacerbator to problem: first attempt to even read a __thread
> variable can lead to allocation which is the real problem in
> relation to async-signal-safety
>
> I mention the exacerbator because pthread_getspecific and
> pthread_setSpecific are also not async-signal-safe but we already
> use them. However, pthread_getspecific is in fact (per email threads
> linked above) effectively async-signal-safe, and further a call to
> pthread_getspecific never results in a call to pthread_setspecific
> or an allocation. Hence the pthread functions are almost, if not
> completely, safe in practice with reasonable uses (ie only read from
> signal handler). Which explain this code in existing Thread::current()
>
> #ifdef PARANOID
> // Signal handler should call ThreadLocalStorage::get_thread_slow()
> Thread* t = ThreadLocalStorage::get_thread_slow();
> assert(t != NULL && !t->is_inside_signal_handler(),
> "Don't use Thread::current() inside signal handler");
> #endif
>
> So problem scenario is: use of __thread variable (that belongs to
> the shared-library) in a signal handler.
>
> Solution 0: don't do that. Seriously - like any other
> async-signal-unsafe stuff we should not be using it in real signal
> handlers. The crash handler is a different matter - we try all sorts
> there because it might work and you can't die twice.
>
> Otherwise: narrow the window of exposure.
>
> 1. We ensure we initialize thread_current (even with a dummy value)
> as early as possible in the thread that loads libjvm. As we have no
> signal handlers installed at that point that might use the same
> variable, we can not hit the problem scenario.
>
> 2. We ensure we initialize thread_current in a new thread with all
> signals blocked. This again avoids the problem scenario.
>
> 3. We initialize thread_current in an attaching thread as soon as
> possible and we again first block all signals.
>
> That still leaves the problem of an unattached native thread taking
> a signal whilst in async-signal-unsafe code, and executing a signal
> handler which in turns tries to access thread_current for the first
> time. This signal handler need not be an actual JVM handler, but one
> attached by other native code eg an agent. I'm not clear in the
> latter case how reasonable it is for an agent's handler to try and
> do things from an unattached thread - and we don't claim any JNI
> interfaces can, or should, be called from a signal handler - but it
> is something you can probably get away with today.
>
> Let me also point out that we already effectively have this code in
> Solaris already (at the ThreadLocalStorage class level). So if there
> is something here that will prevent the current proposal we already
> have a problem on Solaris. :(
>
> Thoughts/comments/suggestions?
>
> Thanks,
> David
>
>
> On 6/11/2015 1:09 PM, David Holmes wrote:
>
> Hi Jeremy,
>
> I was going to ask you to elaborate :)
>
> On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>
> I should probably elaborate on this. With glibc + ELF, the
> first time a
> thread accesses a variable declared __thread, if that
> variable is in a
> shared library (as opposed to the main executable), the
> system calls
> malloc() to allocate the space for it. If that happens in a
> signal that
> is being delivered during a call to malloc(), then you
> usually get a
> crash.
>
>
> My understanding of the ELF ABI for thread-locals - which I read
> about
> in the Solaris 11.1 Linkers and libraries guide - does require
> use of
> the dynamic TLS model for any dynamically loaded shared object which
> defines a thread-local, but that is what we use as I understand
> it. The
> docs state:
>
> "A shared object containing only dynamic TLS can be loaded following
> process startup without limitations. The runtime linker extends
> the list
> of initialization records to include the initialization template
> of the
> new object. The new object is given an index of m = M + 1. The
> counter M is incremented by 1. However, the allocation of new
> TLS blocks
> is deferred until the blocks are actually referenced."
>
> Now I guess "extends the list" might be implemented using malloc
> ... but
> this will only occur in the main thread (the one started by the
> launcher
> to load the JVM and become the main thread), at the time libjvm is
> loaded - which will all be over before any agent etc can run and do
> anything. But "allocation ... is deferred" suggests we may have a
> problem until either the first call to Thread::current or the
> call to
> Thread::initialize_thread_current. If it is the former then that
> should
> occur well before any agent etc can be loaded. And I can easily
> inject
> an initial dummy call to initialize_thread_current(null) to
> force the
> TLS allocation.
>
> This may bite you if AsyncGetCallTrace uses
> Thread::current(), and you
> use system timers to do profiling. If a thread doing a
> malloc() prior
> to the first time it accesses Thread::current(), and it gets
> delivered a
> signal, it might die. This is especially likely for pure
> native threads
> started by native code.
>
> I believe that this is a use case you support, so you might
> want to make
> sure it is okay.
>
>
> For a VM embedded in a process, which already contains native
> threads,
> that will later attach to the VM, this may indeed be a problem. One
> would have hoped however that the implementation of TLS would be
> completely robust, at least for something as simple as getting a
> signal
> whilst in the allocator.
>
> I'm unclear how to test for or check for this kind of problem.
> Arguably
> there could be many things that are async-unsafe in this way.
>
> Need to think more about this and do some more research. Would also
> appreciate any insight from any glibc and/or ELF gurus.
>
> Thanks.
> David
>
> Jeremy
>
> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
> <jeremymanson at google.com <mailto:jeremymanson at google.com>
> <mailto:jeremymanson at google.com
> <mailto:jeremymanson at google.com>>> wrote:
>
> Something that's bitten me with __thread: it isn't
> async-safe when
> called from a shared object on Linux. Have you vetted
> to make sure
> this doesn't make HS less async-safe?
>
> Jeremy
>
> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
> <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>> wrote:
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>
> Open webrev:
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>
> A simple (in principle) but wide-ranging change
> which should
> appeal to our Code Deletion Engineer's. We implement
> Thread::current() using a compiler/language-based
> thread-local
> variable eg:
>
>
> static __thread Thread *_thr_current;
>
> inline Thread* Thread::current() {
> return _thr_current;
> }
>
> with an appropriate setter of course. By doing this
> we can
> completely remove the platform-specific
> ThreadLocalStorage
> implementations, and the associated
> os::thread_local_storage*
> calls, plus all the uses of
> ThreadLocalStorage::thread() and
> ThreadLocalStorage::get_thread_slow(). This extends
> the previous
> work done on Solaris to implement
> ThreadLocalStorage::thread()
> using compiler-based thread-locals.
>
> We can also consolidate nearly all the os_cpu
> versions of
> MacroAssembler::get_thread on x86 into one cpu
> specific one ( a
> special variant is still needed for 32-bit Windows).
>
> As a result of this change we have further
> potential cleanups:
> - all the src/os/<os>/vm/thread_<os>.inline.hpp
> files are now
> completely empty and could also be removed
> - the MINIMIZE_RAM_USAGE define (which avoids use
> of the linux
> sp-map "cache" on 32-bit) now has no affect and so
> could be
> completely removed from the build system
>
> I plan to do the MINIMIZE_RAM_USAGE removal as a
> follow up CR,
> but could add the removal of the "inline" files to
> this CR if
> people think it worth removing them.
>
> I have one missing piece on Aarch64 - I need to change
> MacroAssembler::get_thread to simply call
> Thread::current() as
> on other platforms, but I don't know how to write
> that. I would
> appreciate it if someone could give me the right
> code for that.
>
> I would also appreciate comments/testing by the AIX
> and PPC64
> folk as well.
>
> A concern about memory-leaks had previously been
> raised, but
> experiments using simple C code on linux 86 and
> Solaris showed
> no issues. Also note that Aarch64 already uses this
> kind of
> thread-local.
>
> Thanks,
> David
>
>
>
>
More information about the porters-dev
mailing list