RFR: 8130212: Thread::current() might access freed memory on Solaris

Tue Aug 4 09:32:11 UTC 2015

Hi David,

On Mon, Aug 3, 2015 at 10:22 PM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> On 4/08/2015 1:38 AM, Thomas Stüfe wrote:
>
>> Hi David,
>>
>> we added compiler-level TLS (__thread) for Linux a while ago to do
>> exactly what you are doing, but then had to remove it again. Bugs in the
>> glibc caused memory leaks - basically it looked like glibc was not
>> cleaning TLS related structures. Leak was small but it added up over time.
>>
>> I know you implemented this for Solaris. Just thought I give you a
>> warning, maybe this is something to keep in mind.
>>
>
> Thanks for the heads-up! Linux et al are next on the list. I'll put
> together a simple thread creation test and see if the memory use changes
> over time.
>
>
Sounds good. Unfortunately this was a gcc/glibc bug and therefore you may
not see the bug on every linux system.

I think this may have been the bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=12650

Kind Regards, Thomas

David
>
> Thanks, Thomas
>>
>>
>> On Wed, Jul 29, 2015 at 7:46 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Summary: replace complex custom code for maintaining
>>     ThreadLocalStorage with compiler supported thread-local variables
>>     (Solaris only)
>>
>>     This is a non-public bug so let me explain with some background, the
>>     bug, and then the fix - which involves lots of complex-code deletion
>>     and addition of some very simple code. :)
>>
>>     webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>
>>     In various parts of the runtime and in compiler generated code we
>>     need to get a reference to the (VM-level) Thread* of the currently
>>     executing thread. This is what Thread::current() returns. For
>>     performance reasons we also have a fast-path on 64-bit where the
>>     Thread* is stashed away in a register (g7 on sparc, r15 on x64).
>>
>>     So Thread::current() is actually a slow-path mechanism and it
>>     delegates to ThreadLocalStorage::thread().
>>
>>     On some systems ThreadLocalStorage::thread utilizes a caching
>>     mechanism to try and speed up access to the current thread.
>>     Otherwise it calls into yet another "slow" path which uses the
>>     available platform thread-specific-storage APIs.
>>
>>     Compiled code also has a slow-path get_thread() method which uses
>>     assembly code to invoke the same platform thread-specific-storage
>>     APIs (in some cases - on sparc it simply calls
>>     ThreadLocalStorage::thread()).
>>
>>     On Solaris 64-bit (which is all we support today) there is a simple
>>     1-level thread cache which is an array of Thread*. If a thread
>>     doesn't find itself in the slot for the hash of its id it inserts
>>     itself there. As a thread terminates it clears out its
>>     ThreadLocalStorage values including any cached reference.
>>
>>     The bug is that we have potential for a read-after-free error due to
>>     this code:
>>
>>        46   uintptr_t raw = pd_raw_thread_id();
>>        47   int ix = pd_cache_index(raw);  // hashes id
>>        48   Thread* candidate = ThreadLocalStorage::_get_thread_cache[ix];
>>        49   if (candidate->self_raw_id() == raw) {
>>        50     // hit
>>        51     return candidate;
>>        52   } else {
>>        53     return
>>     ThreadLocalStorage::get_thread_via_cache_slowly(raw, ix);
>>        54   }
>>
>>     The problem is that the value read as candidate could be a thread
>>     that (after line 48) terminated and was freed. But line #49 then
>>     reads the raw id of that thread, which is then a read-after-free -
>>     which is a "Bad Thing (TM)".
>>
>>     There's no simple fix for the caching code - you would need a
>>     completely different approach (or synchronization that would nullify
>>     the whole point of the cache).
>>
>>     Now all this ThreadLocalStorage code is pretty old and was put in
>>     place to deal with inadequacies of the system provided
>>     thread-specific-storage API. In fact on Solaris we even by-pass the
>>     public API (thr_getspecific/thr_setspecific) when we can and
>>     implement our own version using lower-level APIs available in the
>>     T1/T2 threading libraries!
>>
>>     In mid-2015 things have changed considerably and we have reliable
>>     and performant support for thread-local variables at the C+
>>     language-level. So the way to maintain the current thread is simply
>>     using:
>>
>>       // Declaration of thread-local variable
>>       static __thread Thread * _thr_current
>>
>>       inline Thread* ThreadLocalStorage::thread()  {
>>         return _thr_current;
>>       }
>>
>>       inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>         _thr_current = thread;
>>       }
>>
>>     And all the complex ThreadLocalStorage code with caching etc all
>>     vanishes!
>>
>>     For my next trick I plan to try and remove the ThreadLocalStorage
>>     class completely by using language-based thread-locals on all
>>     platforms. But for now this is just Solaris and so we still need the
>>     ThreadLocalStorage API. However a lot of that API is not needed any
>>     more on Solaris so I have excluded it from there in the shared code
>>     (ifndef SOLARIS). But to avoid changing other shared-code callsites
>>     of ThreadLocalStorage I've kept part of the API with trivial
>>     implementations on Solaris.
>>
>>     Testing: JPRT
>>               All hotspot regression tests
>>
>>     I'm happy to run more tests but the nice thing about such low-level
>>     code is that if it is broken, it is always broken :) Every use of
>>     Thread::current or MacroAssembler::get_thread now hits this code.
>>
>>     Performance: I've run a basic set of benchmarks that is readily
>>     available to me on our performance testing system. The best way to
>>     describe the result is neutral. There are some slight wins, and some
>>     slight losses, with most showing no statistical difference. And even
>>     the "wins" and "losses" are within the natural variations of the
>>     benchmarks. So a lot of complex code has been replaced by simple
>>     code and we haven't lost any observable performance - which seems
>>     like a win to me.
>>
>>     Also product mode x64 libjvm.so has shrunk by 921KB - which is a
>>     little surprising but very nice.
>>
>>     Thanks,
>>     David
>>
>>
>>