(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
Jeremy Manson
jeremymanson at google.com
Sat Nov 7 01:22:24 UTC 2015
On Thu, Nov 5, 2015 at 11:48 PM, David Holmes <david.holmes at oracle.com>
wrote:
> On 6/11/2015 4:55 PM, Jeremy Manson wrote:
>
>> FWIW, Google tried to convince the glibc maintainers to make this
>> async-safe, but they weren't biting:
>>
>> https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html
>>
>
> Yes I read all that. I wouldn't say they weren't biting, more of a
> disagreement on the right direction for the patch. glibc weren't prepared
> to take it directly as is, while google-folk didn't seem to think it worth
> their while to do whatever glibc folk wanted. The actual patch proposal
> just died out. :( Quite a pity.
>
> Most of the things you can do are going to be mitigation rather than a
>> fix. I did what you suggest to mitigate, and no one complained, until
>> someone at Google started a sidecar C++ thread that did a boatload of
>> malloc'ing.
>>
>
> Yes all mitigation. :(
>
> My workaround was far sneakier, and fixes the problem entirely, but you
>> probably won't want to do it for code hygiene reasons. I declare the
>> __thread variable in the java launcher itself, and then export a
>> function that provides a pointer to it. In practice, in glibc, if it is
>> in the main executable, ELF is smart enough to declare it as part of the
>> stack, and is therefore async-safe.
>>
>
> But even that only works for our own launchers - not for embedded in the
> JVM. :(
Yup. Fortunately, I can tell people at Google how to write launchers.
> Fortunately, while this is a fun thing to think about, I don't think
>> there are any async paths in the JVM that use Thread::current()
>> (although I could be wrong - there is a comment in there saying not to
>> call Thread::current() in a signal handler...). I would check the call
>> paths in AsyncGetCallTrace to make sure.
>>
>
> So two things ...
>
> First, using Thread::current() in a signal context was disallowed, but the
> alternative was ThreadLocalStorage::get_thread_slow(). The former may not
> work in a signal context due to the caching mechanisms layered in on
> different platforms, while the latter used the platform TLS API which, even
> if not guaranteed, worked well enough in a signal context. With __thread we
> don't have even a pretend signal-safe alternative :(
>
Right.
> Second, AsyncGetCallTrace violates the first rule by using
> JavaThread::current() in an assertion.
>
While we're on the subject, the assertion in Method::bci_from is reachable
from AsyncGetCallTrace and calls err_msg, which can malloc(). I meant to
file a bug about that.
Also, the problem may not be limited to something like AsyncGetCallTrace.
> Though agents may get the current thread from the JNIEnv rather than
> invoking some JVM function that results in Thread::current() being used, I
> can't be sure of that.
>
Which JVM functions that get the thread are supposed to be async-safe?
There is no reason to think that any method that isn't explicitly marked
async-safe is async safe, and most JNI methods I've tried to use from
signal handlers die painfully if I try to use them from a signal handler.
Generally, I don't think it is reasonable for a user to expect async-safety
from an API that isn't expressly designed that way. POSIX has a list of
async-safe methods (signal(7)).
FWIW, to use AsyncGetCallTrace, I get the JNIEnv in a ThreadStart hook from
JVMTI and stash it in a __thread (and pull the trick I mentioned).
Jeremy
> Anyway more things to mull over on the weekedn. :)
>
> Thanks,
> David
>
> Jeremy
>>
>> On Thu, Nov 5, 2015 at 10:26 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> Hi Jeremy,
>>
>> Okay I have read:
>>
>> https://sourceware.org/glibc/wiki/TLSandSignals
>>
>> and the tree of mail postings referenced therefrom - great reading! :)
>>
>> So basic problem: access to __thread variables is not
>> async-signal-safe
>>
>> Exacerbator to problem: first attempt to even read a __thread
>> variable can lead to allocation which is the real problem in
>> relation to async-signal-safety
>>
>> I mention the exacerbator because pthread_getspecific and
>> pthread_setSpecific are also not async-signal-safe but we already
>> use them. However, pthread_getspecific is in fact (per email threads
>> linked above) effectively async-signal-safe, and further a call to
>> pthread_getspecific never results in a call to pthread_setspecific
>> or an allocation. Hence the pthread functions are almost, if not
>> completely, safe in practice with reasonable uses (ie only read from
>> signal handler). Which explain this code in existing Thread::current()
>>
>> #ifdef PARANOID
>> // Signal handler should call ThreadLocalStorage::get_thread_slow()
>> Thread* t = ThreadLocalStorage::get_thread_slow();
>> assert(t != NULL && !t->is_inside_signal_handler(),
>> "Don't use Thread::current() inside signal handler");
>> #endif
>>
>> So problem scenario is: use of __thread variable (that belongs to
>> the shared-library) in a signal handler.
>>
>> Solution 0: don't do that. Seriously - like any other
>> async-signal-unsafe stuff we should not be using it in real signal
>> handlers. The crash handler is a different matter - we try all sorts
>> there because it might work and you can't die twice.
>>
>> Otherwise: narrow the window of exposure.
>>
>> 1. We ensure we initialize thread_current (even with a dummy value)
>> as early as possible in the thread that loads libjvm. As we have no
>> signal handlers installed at that point that might use the same
>> variable, we can not hit the problem scenario.
>>
>> 2. We ensure we initialize thread_current in a new thread with all
>> signals blocked. This again avoids the problem scenario.
>>
>> 3. We initialize thread_current in an attaching thread as soon as
>> possible and we again first block all signals.
>>
>> That still leaves the problem of an unattached native thread taking
>> a signal whilst in async-signal-unsafe code, and executing a signal
>> handler which in turns tries to access thread_current for the first
>> time. This signal handler need not be an actual JVM handler, but one
>> attached by other native code eg an agent. I'm not clear in the
>> latter case how reasonable it is for an agent's handler to try and
>> do things from an unattached thread - and we don't claim any JNI
>> interfaces can, or should, be called from a signal handler - but it
>> is something you can probably get away with today.
>>
>> Let me also point out that we already effectively have this code in
>> Solaris already (at the ThreadLocalStorage class level). So if there
>> is something here that will prevent the current proposal we already
>> have a problem on Solaris. :(
>>
>> Thoughts/comments/suggestions?
>>
>> Thanks,
>> David
>>
>>
>> On 6/11/2015 1:09 PM, David Holmes wrote:
>>
>> Hi Jeremy,
>>
>> I was going to ask you to elaborate :)
>>
>> On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>>
>> I should probably elaborate on this. With glibc + ELF, the
>> first time a
>> thread accesses a variable declared __thread, if that
>> variable is in a
>> shared library (as opposed to the main executable), the
>> system calls
>> malloc() to allocate the space for it. If that happens in a
>> signal that
>> is being delivered during a call to malloc(), then you
>> usually get a
>> crash.
>>
>>
>> My understanding of the ELF ABI for thread-locals - which I read
>> about
>> in the Solaris 11.1 Linkers and libraries guide - does require
>> use of
>> the dynamic TLS model for any dynamically loaded shared object
>> which
>> defines a thread-local, but that is what we use as I understand
>> it. The
>> docs state:
>>
>> "A shared object containing only dynamic TLS can be loaded
>> following
>> process startup without limitations. The runtime linker extends
>> the list
>> of initialization records to include the initialization template
>> of the
>> new object. The new object is given an index of m = M + 1. The
>> counter M is incremented by 1. However, the allocation of new
>> TLS blocks
>> is deferred until the blocks are actually referenced."
>>
>> Now I guess "extends the list" might be implemented using malloc
>> ... but
>> this will only occur in the main thread (the one started by the
>> launcher
>> to load the JVM and become the main thread), at the time libjvm is
>> loaded - which will all be over before any agent etc can run and
>> do
>> anything. But "allocation ... is deferred" suggests we may have a
>> problem until either the first call to Thread::current or the
>> call to
>> Thread::initialize_thread_current. If it is the former then that
>> should
>> occur well before any agent etc can be loaded. And I can easily
>> inject
>> an initial dummy call to initialize_thread_current(null) to
>> force the
>> TLS allocation.
>>
>> This may bite you if AsyncGetCallTrace uses
>> Thread::current(), and you
>> use system timers to do profiling. If a thread doing a
>> malloc() prior
>> to the first time it accesses Thread::current(), and it gets
>> delivered a
>> signal, it might die. This is especially likely for pure
>> native threads
>> started by native code.
>>
>> I believe that this is a use case you support, so you might
>> want to make
>> sure it is okay.
>>
>>
>> For a VM embedded in a process, which already contains native
>> threads,
>> that will later attach to the VM, this may indeed be a problem.
>> One
>> would have hoped however that the implementation of TLS would be
>> completely robust, at least for something as simple as getting a
>> signal
>> whilst in the allocator.
>>
>> I'm unclear how to test for or check for this kind of problem.
>> Arguably
>> there could be many things that are async-unsafe in this way.
>>
>> Need to think more about this and do some more research. Would
>> also
>> appreciate any insight from any glibc and/or ELF gurus.
>>
>> Thanks.
>> David
>>
>> Jeremy
>>
>> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
>> <jeremymanson at google.com <mailto:jeremymanson at google.com>
>> <mailto:jeremymanson at google.com
>> <mailto:jeremymanson at google.com>>> wrote:
>>
>> Something that's bitten me with __thread: it isn't
>> async-safe when
>> called from a shared object on Linux. Have you vetted
>> to make sure
>> this doesn't make HS less async-safe?
>>
>> Jeremy
>>
>> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
>> <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>>
>> <mailto:david.holmes at oracle.com>>> wrote:
>>
>> bug:
>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>
>> Open webrev:
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>>
>> A simple (in principle) but wide-ranging change
>> which should
>> appeal to our Code Deletion Engineer's. We implement
>> Thread::current() using a compiler/language-based
>> thread-local
>> variable eg:
>>
>>
>> static __thread Thread *_thr_current;
>>
>> inline Thread* Thread::current() {
>> return _thr_current;
>> }
>>
>> with an appropriate setter of course. By doing this
>> we can
>> completely remove the platform-specific
>> ThreadLocalStorage
>> implementations, and the associated
>> os::thread_local_storage*
>> calls, plus all the uses of
>> ThreadLocalStorage::thread() and
>> ThreadLocalStorage::get_thread_slow(). This extends
>> the previous
>> work done on Solaris to implement
>> ThreadLocalStorage::thread()
>> using compiler-based thread-locals.
>>
>> We can also consolidate nearly all the os_cpu
>> versions of
>> MacroAssembler::get_thread on x86 into one cpu
>> specific one ( a
>> special variant is still needed for 32-bit Windows).
>>
>> As a result of this change we have further
>> potential cleanups:
>> - all the src/os/<os>/vm/thread_<os>.inline.hpp
>> files are now
>> completely empty and could also be removed
>> - the MINIMIZE_RAM_USAGE define (which avoids use
>> of the linux
>> sp-map "cache" on 32-bit) now has no affect and so
>> could be
>> completely removed from the build system
>>
>> I plan to do the MINIMIZE_RAM_USAGE removal as a
>> follow up CR,
>> but could add the removal of the "inline" files to
>> this CR if
>> people think it worth removing them.
>>
>> I have one missing piece on Aarch64 - I need to
>> change
>> MacroAssembler::get_thread to simply call
>> Thread::current() as
>> on other platforms, but I don't know how to write
>> that. I would
>> appreciate it if someone could give me the right
>> code for that.
>>
>> I would also appreciate comments/testing by the AIX
>> and PPC64
>> folk as well.
>>
>> A concern about memory-leaks had previously been
>> raised, but
>> experiments using simple C code on linux 86 and
>> Solaris showed
>> no issues. Also note that Aarch64 already uses this
>> kind of
>> thread-local.
>>
>> Thanks,
>> David
>>
>>
>>
>>
>>
More information about the hotspot-dev
mailing list