RFR: 8130212: Thread::current() might access freed memory on Solaris

David Holmes david.holmes at oracle.com
Wed Jul 29 05:46:11 UTC 2015


Summary: replace complex custom code for maintaining ThreadLocalStorage 
with compiler supported thread-local variables (Solaris only)

This is a non-public bug so let me explain with some background, the 
bug, and then the fix - which involves lots of complex-code deletion and 
addition of some very simple code. :)

webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/

In various parts of the runtime and in compiler generated code we need 
to get a reference to the (VM-level) Thread* of the currently executing 
thread. This is what Thread::current() returns. For performance reasons 
we also have a fast-path on 64-bit where the Thread* is stashed away in 
a register (g7 on sparc, r15 on x64).

So Thread::current() is actually a slow-path mechanism and it delegates 
to ThreadLocalStorage::thread().

On some systems ThreadLocalStorage::thread utilizes a caching mechanism 
to try and speed up access to the current thread. Otherwise it calls 
into yet another "slow" path which uses the available platform 
thread-specific-storage APIs.

Compiled code also has a slow-path get_thread() method which uses 
assembly code to invoke the same platform thread-specific-storage APIs 
(in some cases - on sparc it simply calls ThreadLocalStorage::thread()).

On Solaris 64-bit (which is all we support today) there is a simple 
1-level thread cache which is an array of Thread*. If a thread doesn't 
find itself in the slot for the hash of its id it inserts itself there. 
As a thread terminates it clears out its ThreadLocalStorage values 
including any cached reference.

The bug is that we have potential for a read-after-free error due to 
this code:

   46   uintptr_t raw = pd_raw_thread_id();
   47   int ix = pd_cache_index(raw);  // hashes id
   48   Thread* candidate = ThreadLocalStorage::_get_thread_cache[ix];
   49   if (candidate->self_raw_id() == raw) {
   50     // hit
   51     return candidate;
   52   } else {
   53     return ThreadLocalStorage::get_thread_via_cache_slowly(raw, ix);
   54   }

The problem is that the value read as candidate could be a thread that 
(after line 48) terminated and was freed. But line #49 then reads the 
raw id of that thread, which is then a read-after-free - which is a "Bad 
Thing (TM)".

There's no simple fix for the caching code - you would need a completely 
different approach (or synchronization that would nullify the whole 
point of the cache).

Now all this ThreadLocalStorage code is pretty old and was put in place 
to deal with inadequacies of the system provided thread-specific-storage 
API. In fact on Solaris we even by-pass the public API 
(thr_getspecific/thr_setspecific) when we can and implement our own 
version using lower-level APIs available in the T1/T2 threading libraries!

In mid-2015 things have changed considerably and we have reliable and 
performant support for thread-local variables at the C+ language-level. 
So the way to maintain the current thread is simply using:

  // Declaration of thread-local variable
  static __thread Thread * _thr_current

  inline Thread* ThreadLocalStorage::thread()  {
    return _thr_current;
  }

  inline void ThreadLocalStorage::set_thread(Thread* thread) {
    _thr_current = thread;
  }

And all the complex ThreadLocalStorage code with caching etc all vanishes!

For my next trick I plan to try and remove the ThreadLocalStorage class 
completely by using language-based thread-locals on all platforms. But 
for now this is just Solaris and so we still need the ThreadLocalStorage 
API. However a lot of that API is not needed any more on Solaris so I 
have excluded it from there in the shared code (ifndef SOLARIS). But to 
avoid changing other shared-code callsites of ThreadLocalStorage I've 
kept part of the API with trivial implementations on Solaris.

Testing: JPRT
          All hotspot regression tests

I'm happy to run more tests but the nice thing about such low-level code 
is that if it is broken, it is always broken :) Every use of 
Thread::current or MacroAssembler::get_thread now hits this code.

Performance: I've run a basic set of benchmarks that is readily 
available to me on our performance testing system. The best way to 
describe the result is neutral. There are some slight wins, and some 
slight losses, with most showing no statistical difference. And even the 
"wins" and "losses" are within the natural variations of the benchmarks. 
So a lot of complex code has been replaced by simple code and we haven't 
lost any observable performance - which seems like a win to me.

Also product mode x64 libjvm.so has shrunk by 921KB - which is a little 
surprising but very nice.

Thanks,
David


More information about the hotspot-runtime-dev mailing list