RFR: 8130212: Thread::current() might access freed memory on Solaris
David Holmes
david.holmes at oracle.com
Tue Aug 4 10:45:52 UTC 2015
On 4/08/2015 7:32 PM, Thomas Stüfe wrote:
> Hi David,
>
> On Mon, Aug 3, 2015 at 10:22 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> Hi Thomas,
>
> On 4/08/2015 1:38 AM, Thomas Stüfe wrote:
>
> Hi David,
>
> we added compiler-level TLS (__thread) for Linux a while ago to do
> exactly what you are doing, but then had to remove it again.
> Bugs in the
> glibc caused memory leaks - basically it looked like glibc was not
> cleaning TLS related structures. Leak was small but it added up
> over time.
>
> I know you implemented this for Solaris. Just thought I give you a
> warning, maybe this is something to keep in mind.
>
>
> Thanks for the heads-up! Linux et al are next on the list. I'll put
> together a simple thread creation test and see if the memory use
> changes over time.
>
>
> Sounds good. Unfortunately this was a gcc/glibc bug and therefore you
> may not see the bug on every linux system.
>
> I think this may have been the bug:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=12650
Many thanks! That explains why my simple test showed no issues - I need
to implement in the VM and see what happens. <sigh>
David
> Kind Regards, Thomas
>
> David
>
> Thanks, Thomas
>
>
> On Wed, Jul 29, 2015 at 7:46 AM, David Holmes
> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>> wrote:
>
> Summary: replace complex custom code for maintaining
> ThreadLocalStorage with compiler supported thread-local
> variables
> (Solaris only)
>
> This is a non-public bug so let me explain with some
> background, the
> bug, and then the fix - which involves lots of complex-code
> deletion
> and addition of some very simple code. :)
>
> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>
> In various parts of the runtime and in compiler generated
> code we
> need to get a reference to the (VM-level) Thread* of the
> currently
> executing thread. This is what Thread::current() returns. For
> performance reasons we also have a fast-path on 64-bit
> where the
> Thread* is stashed away in a register (g7 on sparc, r15 on
> x64).
>
> So Thread::current() is actually a slow-path mechanism and it
> delegates to ThreadLocalStorage::thread().
>
> On some systems ThreadLocalStorage::thread utilizes a caching
> mechanism to try and speed up access to the current thread.
> Otherwise it calls into yet another "slow" path which uses the
> available platform thread-specific-storage APIs.
>
> Compiled code also has a slow-path get_thread() method
> which uses
> assembly code to invoke the same platform
> thread-specific-storage
> APIs (in some cases - on sparc it simply calls
> ThreadLocalStorage::thread()).
>
> On Solaris 64-bit (which is all we support today) there is
> a simple
> 1-level thread cache which is an array of Thread*. If a thread
> doesn't find itself in the slot for the hash of its id it
> inserts
> itself there. As a thread terminates it clears out its
> ThreadLocalStorage values including any cached reference.
>
> The bug is that we have potential for a read-after-free
> error due to
> this code:
>
> 46 uintptr_t raw = pd_raw_thread_id();
> 47 int ix = pd_cache_index(raw); // hashes id
> 48 Thread* candidate =
> ThreadLocalStorage::_get_thread_cache[ix];
> 49 if (candidate->self_raw_id() == raw) {
> 50 // hit
> 51 return candidate;
> 52 } else {
> 53 return
> ThreadLocalStorage::get_thread_via_cache_slowly(raw, ix);
> 54 }
>
> The problem is that the value read as candidate could be a
> thread
> that (after line 48) terminated and was freed. But line #49
> then
> reads the raw id of that thread, which is then a
> read-after-free -
> which is a "Bad Thing (TM)".
>
> There's no simple fix for the caching code - you would need a
> completely different approach (or synchronization that
> would nullify
> the whole point of the cache).
>
> Now all this ThreadLocalStorage code is pretty old and was
> put in
> place to deal with inadequacies of the system provided
> thread-specific-storage API. In fact on Solaris we even
> by-pass the
> public API (thr_getspecific/thr_setspecific) when we can and
> implement our own version using lower-level APIs available
> in the
> T1/T2 threading libraries!
>
> In mid-2015 things have changed considerably and we have
> reliable
> and performant support for thread-local variables at the C+
> language-level. So the way to maintain the current thread
> is simply
> using:
>
> // Declaration of thread-local variable
> static __thread Thread * _thr_current
>
> inline Thread* ThreadLocalStorage::thread() {
> return _thr_current;
> }
>
> inline void ThreadLocalStorage::set_thread(Thread* thread) {
> _thr_current = thread;
> }
>
> And all the complex ThreadLocalStorage code with caching
> etc all
> vanishes!
>
> For my next trick I plan to try and remove the
> ThreadLocalStorage
> class completely by using language-based thread-locals on all
> platforms. But for now this is just Solaris and so we still
> need the
> ThreadLocalStorage API. However a lot of that API is not
> needed any
> more on Solaris so I have excluded it from there in the
> shared code
> (ifndef SOLARIS). But to avoid changing other shared-code
> callsites
> of ThreadLocalStorage I've kept part of the API with trivial
> implementations on Solaris.
>
> Testing: JPRT
> All hotspot regression tests
>
> I'm happy to run more tests but the nice thing about such
> low-level
> code is that if it is broken, it is always broken :) Every
> use of
> Thread::current or MacroAssembler::get_thread now hits this
> code.
>
> Performance: I've run a basic set of benchmarks that is readily
> available to me on our performance testing system. The best
> way to
> describe the result is neutral. There are some slight wins,
> and some
> slight losses, with most showing no statistical difference.
> And even
> the "wins" and "losses" are within the natural variations
> of the
> benchmarks. So a lot of complex code has been replaced by
> simple
> code and we haven't lost any observable performance - which
> seems
> like a win to me.
>
> Also product mode x64 libjvm.so has shrunk by 921KB - which
> is a
> little surprising but very nice.
>
> Thanks,
> David
>
>
>
More information about the hotspot-runtime-dev
mailing list