(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
David Holmes
david.holmes at oracle.com
Mon Nov 9 23:55:19 UTC 2015
On 9/11/2015 8:54 AM, David Holmes wrote:
> On 7/11/2015 11:22 AM, Jeremy Manson wrote:
>>
>>
>> On Thu, Nov 5, 2015 at 11:48 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> On 6/11/2015 4:55 PM, Jeremy Manson wrote:
>>
>> FWIW, Google tried to convince the glibc maintainers to make this
>> async-safe, but they weren't biting:
>>
>> https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html
>>
>>
>> Yes I read all that. I wouldn't say they weren't biting, more of a
>> disagreement on the right direction for the patch. glibc weren't
>> prepared to take it directly as is, while google-folk didn't seem to
>> think it worth their while to do whatever glibc folk wanted. The
>> actual patch proposal just died out. :( Quite a pity.
>>
>> Most of the things you can do are going to be mitigation rather
>> than a
>> fix. I did what you suggest to mitigate, and no one complained,
>> until
>> someone at Google started a sidecar C++ thread that did a
>> boatload of
>> malloc'ing.
>>
>>
>> Yes all mitigation. :(
>>
>> My workaround was far sneakier, and fixes the problem entirely,
>> but you
>> probably won't want to do it for code hygiene reasons. I
>> declare the
>> __thread variable in the java launcher itself, and then export a
>> function that provides a pointer to it. In practice, in glibc,
>> if it is
>> in the main executable, ELF is smart enough to declare it as
>> part of the
>> stack, and is therefore async-safe.
>>
>>
>> But even that only works for our own launchers - not for embedded in
>> the JVM. :(
>>
>>
>> Yup. Fortunately, I can tell people at Google how to write launchers.
>>
>> Fortunately, while this is a fun thing to think about, I don't
>> think
>> there are any async paths in the JVM that use Thread::current()
>> (although I could be wrong - there is a comment in there saying
>> not to
>> call Thread::current() in a signal handler...). I would check
>> the call
>> paths in AsyncGetCallTrace to make sure.
>>
>>
>> So two things ...
>>
>> First, using Thread::current() in a signal context was disallowed,
>> but the alternative was ThreadLocalStorage::get_thread_slow(). The
>> former may not work in a signal context due to the caching
>> mechanisms layered in on different platforms, while the latter used
>> the platform TLS API which, even if not guaranteed, worked well
>> enough in a signal context. With __thread we don't have even a
>> pretend signal-safe alternative :(
>>
>>
>> Right.
>>
>> Second, AsyncGetCallTrace violates the first rule by using
>> JavaThread::current() in an assertion.
>>
>>
>> While we're on the subject, the assertion in Method::bci_from is
>> reachable from AsyncGetCallTrace and calls err_msg, which can malloc().
>> I meant to file a bug about that.
>>
>> Also, the problem may not be limited to something like
>> AsyncGetCallTrace. Though agents may get the current thread from the
>> JNIEnv rather than invoking some JVM function that results in
>> Thread::current() being used, I can't be sure of that.
>>
>>
>> Which JVM functions that get the thread are supposed to be async-safe?
>> There is no reason to think that any method that isn't explicitly marked
>> async-safe is async safe, and most JNI methods I've tried to use from
>> signal handlers die painfully if I try to use them from a signal handler.
>>
>> Generally, I don't think it is reasonable for a user to expect
>> async-safety from an API that isn't expressly designed that way. POSIX
>> has a list of async-safe methods (signal(7)).
>
> Right - no JNI or JVM TI functions are designated as async-signal-safe
> (the specs dont even mention signals).
>
> Unfortunately my problem is more basic: pretty much the first thing the
> JVM signal handler does is get the current thread. So if the signal is
> handled on an unattached thread that happened to be doing a malloc then
> we're toast. :( Most of the signals the JVM expects to handle are not
> blocked by default, AFAICS, so any unattached thread could be selected.
Just to keep my thinking straight on this, the problem only exists for
threads that existed before the JVM was loaded. All threads allocated
after that will have space for all the TLS variables allocated directly.
So the problem scenario is:
- external process with existing threads loads the JVM
- existing thread is executing critical library function eg malloc, when
it takes a process-directed signal.
- JVM signal handler runs and accesses _thr_current which triggers
dynamic TLS allocation
David
-----
> David
> -----
>
>
>> FWIW, to use AsyncGetCallTrace, I get the JNIEnv in a ThreadStart hook
>> from JVMTI and stash it in a __thread (and pull the trick I mentioned).
>> Jeremy
>>
>>
>> Anyway more things to mull over on the weekedn. :)
>>
>> Thanks,
>> David
>>
>> Jeremy
>>
>> On Thu, Nov 5, 2015 at 10:26 PM, David Holmes
>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>> wrote:
>>
>> Hi Jeremy,
>>
>> Okay I have read:
>>
>> https://sourceware.org/glibc/wiki/TLSandSignals
>>
>> and the tree of mail postings referenced therefrom - great
>> reading! :)
>>
>> So basic problem: access to __thread variables is not
>> async-signal-safe
>>
>> Exacerbator to problem: first attempt to even read a
>> __thread
>> variable can lead to allocation which is the real problem in
>> relation to async-signal-safety
>>
>> I mention the exacerbator because pthread_getspecific and
>> pthread_setSpecific are also not async-signal-safe but we
>> already
>> use them. However, pthread_getspecific is in fact (per
>> email threads
>> linked above) effectively async-signal-safe, and further a
>> call to
>> pthread_getspecific never results in a call to
>> pthread_setspecific
>> or an allocation. Hence the pthread functions are almost,
>> if not
>> completely, safe in practice with reasonable uses (ie only
>> read from
>> signal handler). Which explain this code in existing
>> Thread::current()
>>
>> #ifdef PARANOID
>> // Signal handler should call
>> ThreadLocalStorage::get_thread_slow()
>> Thread* t = ThreadLocalStorage::get_thread_slow();
>> assert(t != NULL && !t->is_inside_signal_handler(),
>> "Don't use Thread::current() inside signal
>> handler");
>> #endif
>>
>> So problem scenario is: use of __thread variable (that
>> belongs to
>> the shared-library) in a signal handler.
>>
>> Solution 0: don't do that. Seriously - like any other
>> async-signal-unsafe stuff we should not be using it in real
>> signal
>> handlers. The crash handler is a different matter - we try
>> all sorts
>> there because it might work and you can't die twice.
>>
>> Otherwise: narrow the window of exposure.
>>
>> 1. We ensure we initialize thread_current (even with a
>> dummy value)
>> as early as possible in the thread that loads libjvm. As we
>> have no
>> signal handlers installed at that point that might use
>> the same
>> variable, we can not hit the problem scenario.
>>
>> 2. We ensure we initialize thread_current in a new thread
>> with all
>> signals blocked. This again avoids the problem scenario.
>>
>> 3. We initialize thread_current in an attaching thread as
>> soon as
>> possible and we again first block all signals.
>>
>> That still leaves the problem of an unattached native
>> thread taking
>> a signal whilst in async-signal-unsafe code, and executing
>> a signal
>> handler which in turns tries to access thread_current for
>> the first
>> time. This signal handler need not be an actual JVM
>> handler, but one
>> attached by other native code eg an agent. I'm not clear
>> in the
>> latter case how reasonable it is for an agent's handler to
>> try and
>> do things from an unattached thread - and we don't claim
>> any JNI
>> interfaces can, or should, be called from a signal handler
>> - but it
>> is something you can probably get away with today.
>>
>> Let me also point out that we already effectively have this
>> code in
>> Solaris already (at the ThreadLocalStorage class level). So
>> if there
>> is something here that will prevent the current proposal we
>> already
>> have a problem on Solaris. :(
>>
>> Thoughts/comments/suggestions?
>>
>> Thanks,
>> David
>>
>>
>> On 6/11/2015 1:09 PM, David Holmes wrote:
>>
>> Hi Jeremy,
>>
>> I was going to ask you to elaborate :)
>>
>> On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>>
>> I should probably elaborate on this. With glibc +
>> ELF, the
>> first time a
>> thread accesses a variable declared __thread, if
>> that
>> variable is in a
>> shared library (as opposed to the main
>> executable), the
>> system calls
>> malloc() to allocate the space for it. If that
>> happens in a
>> signal that
>> is being delivered during a call to malloc(),
>> then you
>> usually get a
>> crash.
>>
>>
>> My understanding of the ELF ABI for thread-locals -
>> which I read
>> about
>> in the Solaris 11.1 Linkers and libraries guide - does
>> require
>> use of
>> the dynamic TLS model for any dynamically loaded shared
>> object which
>> defines a thread-local, but that is what we use as I
>> understand
>> it. The
>> docs state:
>>
>> "A shared object containing only dynamic TLS can be
>> loaded following
>> process startup without limitations. The runtime linker
>> extends
>> the list
>> of initialization records to include the initialization
>> template
>> of the
>> new object. The new object is given an index of m = M +
>> 1. The
>> counter M is incremented by 1. However, the allocation
>> of new
>> TLS blocks
>> is deferred until the blocks are actually referenced."
>>
>> Now I guess "extends the list" might be implemented
>> using malloc
>> ... but
>> this will only occur in the main thread (the one
>> started by the
>> launcher
>> to load the JVM and become the main thread), at the
>> time libjvm is
>> loaded - which will all be over before any agent etc
>> can run and do
>> anything. But "allocation ... is deferred" suggests we
>> may have a
>> problem until either the first call to Thread::current
>> or the
>> call to
>> Thread::initialize_thread_current. If it is the former
>> then that
>> should
>> occur well before any agent etc can be loaded. And I
>> can easily
>> inject
>> an initial dummy call to
>> initialize_thread_current(null) to
>> force the
>> TLS allocation.
>>
>> This may bite you if AsyncGetCallTrace uses
>> Thread::current(), and you
>> use system timers to do profiling. If a thread
>> doing a
>> malloc() prior
>> to the first time it accesses Thread::current(),
>> and it gets
>> delivered a
>> signal, it might die. This is especially likely
>> for pure
>> native threads
>> started by native code.
>>
>> I believe that this is a use case you support, so
>> you might
>> want to make
>> sure it is okay.
>>
>>
>> For a VM embedded in a process, which already contains
>> native
>> threads,
>> that will later attach to the VM, this may indeed be a
>> problem. One
>> would have hoped however that the implementation of TLS
>> would be
>> completely robust, at least for something as simple as
>> getting a
>> signal
>> whilst in the allocator.
>>
>> I'm unclear how to test for or check for this kind of
>> problem.
>> Arguably
>> there could be many things that are async-unsafe in
>> this way.
>>
>> Need to think more about this and do some more
>> research. Would also
>> appreciate any insight from any glibc and/or ELF gurus.
>>
>> Thanks.
>> David
>>
>> Jeremy
>>
>> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
>> <jeremymanson at google.com
>> <mailto:jeremymanson at google.com> <mailto:jeremymanson at google.com
>> <mailto:jeremymanson at google.com>>
>> <mailto:jeremymanson at google.com
>> <mailto:jeremymanson at google.com>
>> <mailto:jeremymanson at google.com
>> <mailto:jeremymanson at google.com>>>> wrote:
>>
>> Something that's bitten me with __thread: it
>> isn't
>> async-safe when
>> called from a shared object on Linux. Have
>> you vetted
>> to make sure
>> this doesn't make HS less async-safe?
>>
>> Jeremy
>>
>> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
>> <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>>> wrote:
>>
>> bug:
>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>
>> Open webrev:
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>>
>> A simple (in principle) but wide-ranging
>> change
>> which should
>> appeal to our Code Deletion Engineer's. We
>> implement
>> Thread::current() using a
>> compiler/language-based
>> thread-local
>> variable eg:
>>
>>
>> static __thread Thread *_thr_current;
>>
>> inline Thread* Thread::current() {
>> return _thr_current;
>> }
>>
>> with an appropriate setter of course. By
>> doing this
>> we can
>> completely remove the platform-specific
>> ThreadLocalStorage
>> implementations, and the associated
>> os::thread_local_storage*
>> calls, plus all the uses of
>> ThreadLocalStorage::thread() and
>> ThreadLocalStorage::get_thread_slow().
>> This extends
>> the previous
>> work done on Solaris to implement
>> ThreadLocalStorage::thread()
>> using compiler-based thread-locals.
>>
>> We can also consolidate nearly all the
>> os_cpu
>> versions of
>> MacroAssembler::get_thread on x86 into
>> one cpu
>> specific one ( a
>> special variant is still needed for 32-bit
>> Windows).
>>
>> As a result of this change we have further
>> potential cleanups:
>> - all the
>> src/os/<os>/vm/thread_<os>.inline.hpp
>> files are now
>> completely empty and could also be removed
>> - the MINIMIZE_RAM_USAGE define (which
>> avoids use
>> of the linux
>> sp-map "cache" on 32-bit) now has no
>> affect and so
>> could be
>> completely removed from the build system
>>
>> I plan to do the MINIMIZE_RAM_USAGE
>> removal as a
>> follow up CR,
>> but could add the removal of the "inline"
>> files to
>> this CR if
>> people think it worth removing them.
>>
>> I have one missing piece on Aarch64 - I
>> need to change
>> MacroAssembler::get_thread to simply call
>> Thread::current() as
>> on other platforms, but I don't know how
>> to write
>> that. I would
>> appreciate it if someone could give me the
>> right
>> code for that.
>>
>> I would also appreciate comments/testing
>> by the AIX
>> and PPC64
>> folk as well.
>>
>> A concern about memory-leaks had
>> previously been
>> raised, but
>> experiments using simple C code on linux
>> 86 and
>> Solaris showed
>> no issues. Also note that Aarch64 already
>> uses this
>> kind of
>> thread-local.
>>
>> Thanks,
>> David
>>
>>
>>
>>
>>
More information about the porters-dev
mailing list