(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
David Holmes
david.holmes at oracle.com
Sun Nov 8 22:54:18 UTC 2015
On 7/11/2015 11:22 AM, Jeremy Manson wrote:
>
>
> On Thu, Nov 5, 2015 at 11:48 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> On 6/11/2015 4:55 PM, Jeremy Manson wrote:
>
> FWIW, Google tried to convince the glibc maintainers to make this
> async-safe, but they weren't biting:
>
> https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html
>
>
> Yes I read all that. I wouldn't say they weren't biting, more of a
> disagreement on the right direction for the patch. glibc weren't
> prepared to take it directly as is, while google-folk didn't seem to
> think it worth their while to do whatever glibc folk wanted. The
> actual patch proposal just died out. :( Quite a pity.
>
> Most of the things you can do are going to be mitigation rather
> than a
> fix. I did what you suggest to mitigate, and no one complained,
> until
> someone at Google started a sidecar C++ thread that did a
> boatload of
> malloc'ing.
>
>
> Yes all mitigation. :(
>
> My workaround was far sneakier, and fixes the problem entirely,
> but you
> probably won't want to do it for code hygiene reasons. I
> declare the
> __thread variable in the java launcher itself, and then export a
> function that provides a pointer to it. In practice, in glibc,
> if it is
> in the main executable, ELF is smart enough to declare it as
> part of the
> stack, and is therefore async-safe.
>
>
> But even that only works for our own launchers - not for embedded in
> the JVM. :(
>
>
> Yup. Fortunately, I can tell people at Google how to write launchers.
>
> Fortunately, while this is a fun thing to think about, I don't think
> there are any async paths in the JVM that use Thread::current()
> (although I could be wrong - there is a comment in there saying
> not to
> call Thread::current() in a signal handler...). I would check
> the call
> paths in AsyncGetCallTrace to make sure.
>
>
> So two things ...
>
> First, using Thread::current() in a signal context was disallowed,
> but the alternative was ThreadLocalStorage::get_thread_slow(). The
> former may not work in a signal context due to the caching
> mechanisms layered in on different platforms, while the latter used
> the platform TLS API which, even if not guaranteed, worked well
> enough in a signal context. With __thread we don't have even a
> pretend signal-safe alternative :(
>
>
> Right.
>
> Second, AsyncGetCallTrace violates the first rule by using
> JavaThread::current() in an assertion.
>
>
> While we're on the subject, the assertion in Method::bci_from is
> reachable from AsyncGetCallTrace and calls err_msg, which can malloc().
> I meant to file a bug about that.
>
> Also, the problem may not be limited to something like
> AsyncGetCallTrace. Though agents may get the current thread from the
> JNIEnv rather than invoking some JVM function that results in
> Thread::current() being used, I can't be sure of that.
>
>
> Which JVM functions that get the thread are supposed to be async-safe?
> There is no reason to think that any method that isn't explicitly marked
> async-safe is async safe, and most JNI methods I've tried to use from
> signal handlers die painfully if I try to use them from a signal handler.
>
> Generally, I don't think it is reasonable for a user to expect
> async-safety from an API that isn't expressly designed that way. POSIX
> has a list of async-safe methods (signal(7)).
Right - no JNI or JVM TI functions are designated as async-signal-safe
(the specs dont even mention signals).
Unfortunately my problem is more basic: pretty much the first thing the
JVM signal handler does is get the current thread. So if the signal is
handled on an unattached thread that happened to be doing a malloc then
we're toast. :( Most of the signals the JVM expects to handle are not
blocked by default, AFAICS, so any unattached thread could be selected.
David
-----
> FWIW, to use AsyncGetCallTrace, I get the JNIEnv in a ThreadStart hook
> from JVMTI and stash it in a __thread (and pull the trick I mentioned).
> Jeremy
>
>
> Anyway more things to mull over on the weekedn. :)
>
> Thanks,
> David
>
> Jeremy
>
> On Thu, Nov 5, 2015 at 10:26 PM, David Holmes
> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>> wrote:
>
> Hi Jeremy,
>
> Okay I have read:
>
> https://sourceware.org/glibc/wiki/TLSandSignals
>
> and the tree of mail postings referenced therefrom - great
> reading! :)
>
> So basic problem: access to __thread variables is not
> async-signal-safe
>
> Exacerbator to problem: first attempt to even read a __thread
> variable can lead to allocation which is the real problem in
> relation to async-signal-safety
>
> I mention the exacerbator because pthread_getspecific and
> pthread_setSpecific are also not async-signal-safe but we
> already
> use them. However, pthread_getspecific is in fact (per
> email threads
> linked above) effectively async-signal-safe, and further a
> call to
> pthread_getspecific never results in a call to
> pthread_setspecific
> or an allocation. Hence the pthread functions are almost,
> if not
> completely, safe in practice with reasonable uses (ie only
> read from
> signal handler). Which explain this code in existing
> Thread::current()
>
> #ifdef PARANOID
> // Signal handler should call
> ThreadLocalStorage::get_thread_slow()
> Thread* t = ThreadLocalStorage::get_thread_slow();
> assert(t != NULL && !t->is_inside_signal_handler(),
> "Don't use Thread::current() inside signal handler");
> #endif
>
> So problem scenario is: use of __thread variable (that
> belongs to
> the shared-library) in a signal handler.
>
> Solution 0: don't do that. Seriously - like any other
> async-signal-unsafe stuff we should not be using it in real
> signal
> handlers. The crash handler is a different matter - we try
> all sorts
> there because it might work and you can't die twice.
>
> Otherwise: narrow the window of exposure.
>
> 1. We ensure we initialize thread_current (even with a
> dummy value)
> as early as possible in the thread that loads libjvm. As we
> have no
> signal handlers installed at that point that might use the same
> variable, we can not hit the problem scenario.
>
> 2. We ensure we initialize thread_current in a new thread
> with all
> signals blocked. This again avoids the problem scenario.
>
> 3. We initialize thread_current in an attaching thread as
> soon as
> possible and we again first block all signals.
>
> That still leaves the problem of an unattached native
> thread taking
> a signal whilst in async-signal-unsafe code, and executing
> a signal
> handler which in turns tries to access thread_current for
> the first
> time. This signal handler need not be an actual JVM
> handler, but one
> attached by other native code eg an agent. I'm not clear in the
> latter case how reasonable it is for an agent's handler to
> try and
> do things from an unattached thread - and we don't claim
> any JNI
> interfaces can, or should, be called from a signal handler
> - but it
> is something you can probably get away with today.
>
> Let me also point out that we already effectively have this
> code in
> Solaris already (at the ThreadLocalStorage class level). So
> if there
> is something here that will prevent the current proposal we
> already
> have a problem on Solaris. :(
>
> Thoughts/comments/suggestions?
>
> Thanks,
> David
>
>
> On 6/11/2015 1:09 PM, David Holmes wrote:
>
> Hi Jeremy,
>
> I was going to ask you to elaborate :)
>
> On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>
> I should probably elaborate on this. With glibc +
> ELF, the
> first time a
> thread accesses a variable declared __thread, if that
> variable is in a
> shared library (as opposed to the main executable), the
> system calls
> malloc() to allocate the space for it. If that
> happens in a
> signal that
> is being delivered during a call to malloc(), then you
> usually get a
> crash.
>
>
> My understanding of the ELF ABI for thread-locals -
> which I read
> about
> in the Solaris 11.1 Linkers and libraries guide - does
> require
> use of
> the dynamic TLS model for any dynamically loaded shared
> object which
> defines a thread-local, but that is what we use as I
> understand
> it. The
> docs state:
>
> "A shared object containing only dynamic TLS can be
> loaded following
> process startup without limitations. The runtime linker
> extends
> the list
> of initialization records to include the initialization
> template
> of the
> new object. The new object is given an index of m = M +
> 1. The
> counter M is incremented by 1. However, the allocation
> of new
> TLS blocks
> is deferred until the blocks are actually referenced."
>
> Now I guess "extends the list" might be implemented
> using malloc
> ... but
> this will only occur in the main thread (the one
> started by the
> launcher
> to load the JVM and become the main thread), at the
> time libjvm is
> loaded - which will all be over before any agent etc
> can run and do
> anything. But "allocation ... is deferred" suggests we
> may have a
> problem until either the first call to Thread::current
> or the
> call to
> Thread::initialize_thread_current. If it is the former
> then that
> should
> occur well before any agent etc can be loaded. And I
> can easily
> inject
> an initial dummy call to initialize_thread_current(null) to
> force the
> TLS allocation.
>
> This may bite you if AsyncGetCallTrace uses
> Thread::current(), and you
> use system timers to do profiling. If a thread doing a
> malloc() prior
> to the first time it accesses Thread::current(),
> and it gets
> delivered a
> signal, it might die. This is especially likely
> for pure
> native threads
> started by native code.
>
> I believe that this is a use case you support, so
> you might
> want to make
> sure it is okay.
>
>
> For a VM embedded in a process, which already contains
> native
> threads,
> that will later attach to the VM, this may indeed be a
> problem. One
> would have hoped however that the implementation of TLS
> would be
> completely robust, at least for something as simple as
> getting a
> signal
> whilst in the allocator.
>
> I'm unclear how to test for or check for this kind of
> problem.
> Arguably
> there could be many things that are async-unsafe in
> this way.
>
> Need to think more about this and do some more
> research. Would also
> appreciate any insight from any glibc and/or ELF gurus.
>
> Thanks.
> David
>
> Jeremy
>
> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
> <jeremymanson at google.com
> <mailto:jeremymanson at google.com> <mailto:jeremymanson at google.com
> <mailto:jeremymanson at google.com>>
> <mailto:jeremymanson at google.com
> <mailto:jeremymanson at google.com>
> <mailto:jeremymanson at google.com
> <mailto:jeremymanson at google.com>>>> wrote:
>
> Something that's bitten me with __thread: it isn't
> async-safe when
> called from a shared object on Linux. Have
> you vetted
> to make sure
> this doesn't make HS less async-safe?
>
> Jeremy
>
> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
> <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>> wrote:
>
> bug:
> https://bugs.openjdk.java.net/browse/JDK-8132510
>
> Open webrev:
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>
> A simple (in principle) but wide-ranging
> change
> which should
> appeal to our Code Deletion Engineer's. We
> implement
> Thread::current() using a
> compiler/language-based
> thread-local
> variable eg:
>
>
> static __thread Thread *_thr_current;
>
> inline Thread* Thread::current() {
> return _thr_current;
> }
>
> with an appropriate setter of course. By
> doing this
> we can
> completely remove the platform-specific
> ThreadLocalStorage
> implementations, and the associated
> os::thread_local_storage*
> calls, plus all the uses of
> ThreadLocalStorage::thread() and
> ThreadLocalStorage::get_thread_slow().
> This extends
> the previous
> work done on Solaris to implement
> ThreadLocalStorage::thread()
> using compiler-based thread-locals.
>
> We can also consolidate nearly all the os_cpu
> versions of
> MacroAssembler::get_thread on x86 into one cpu
> specific one ( a
> special variant is still needed for 32-bit
> Windows).
>
> As a result of this change we have further
> potential cleanups:
> - all the
> src/os/<os>/vm/thread_<os>.inline.hpp
> files are now
> completely empty and could also be removed
> - the MINIMIZE_RAM_USAGE define (which
> avoids use
> of the linux
> sp-map "cache" on 32-bit) now has no
> affect and so
> could be
> completely removed from the build system
>
> I plan to do the MINIMIZE_RAM_USAGE
> removal as a
> follow up CR,
> but could add the removal of the "inline"
> files to
> this CR if
> people think it worth removing them.
>
> I have one missing piece on Aarch64 - I
> need to change
> MacroAssembler::get_thread to simply call
> Thread::current() as
> on other platforms, but I don't know how
> to write
> that. I would
> appreciate it if someone could give me the
> right
> code for that.
>
> I would also appreciate comments/testing
> by the AIX
> and PPC64
> folk as well.
>
> A concern about memory-leaks had
> previously been
> raised, but
> experiments using simple C code on linux
> 86 and
> Solaris showed
> no issues. Also note that Aarch64 already
> uses this
> kind of
> thread-local.
>
> Thanks,
> David
>
>
>
>
>
More information about the porters-dev
mailing list