RFR: 8130212: Thread::current() might access freed memory on Solaris
David Holmes
david.holmes at oracle.com
Wed Jul 29 05:46:11 UTC 2015
Summary: replace complex custom code for maintaining ThreadLocalStorage
with compiler supported thread-local variables (Solaris only)
This is a non-public bug so let me explain with some background, the
bug, and then the fix - which involves lots of complex-code deletion and
addition of some very simple code. :)
webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
In various parts of the runtime and in compiler generated code we need
to get a reference to the (VM-level) Thread* of the currently executing
thread. This is what Thread::current() returns. For performance reasons
we also have a fast-path on 64-bit where the Thread* is stashed away in
a register (g7 on sparc, r15 on x64).
So Thread::current() is actually a slow-path mechanism and it delegates
to ThreadLocalStorage::thread().
On some systems ThreadLocalStorage::thread utilizes a caching mechanism
to try and speed up access to the current thread. Otherwise it calls
into yet another "slow" path which uses the available platform
thread-specific-storage APIs.
Compiled code also has a slow-path get_thread() method which uses
assembly code to invoke the same platform thread-specific-storage APIs
(in some cases - on sparc it simply calls ThreadLocalStorage::thread()).
On Solaris 64-bit (which is all we support today) there is a simple
1-level thread cache which is an array of Thread*. If a thread doesn't
find itself in the slot for the hash of its id it inserts itself there.
As a thread terminates it clears out its ThreadLocalStorage values
including any cached reference.
The bug is that we have potential for a read-after-free error due to
this code:
46 uintptr_t raw = pd_raw_thread_id();
47 int ix = pd_cache_index(raw); // hashes id
48 Thread* candidate = ThreadLocalStorage::_get_thread_cache[ix];
49 if (candidate->self_raw_id() == raw) {
50 // hit
51 return candidate;
52 } else {
53 return ThreadLocalStorage::get_thread_via_cache_slowly(raw, ix);
54 }
The problem is that the value read as candidate could be a thread that
(after line 48) terminated and was freed. But line #49 then reads the
raw id of that thread, which is then a read-after-free - which is a "Bad
Thing (TM)".
There's no simple fix for the caching code - you would need a completely
different approach (or synchronization that would nullify the whole
point of the cache).
Now all this ThreadLocalStorage code is pretty old and was put in place
to deal with inadequacies of the system provided thread-specific-storage
API. In fact on Solaris we even by-pass the public API
(thr_getspecific/thr_setspecific) when we can and implement our own
version using lower-level APIs available in the T1/T2 threading libraries!
In mid-2015 things have changed considerably and we have reliable and
performant support for thread-local variables at the C+ language-level.
So the way to maintain the current thread is simply using:
// Declaration of thread-local variable
static __thread Thread * _thr_current
inline Thread* ThreadLocalStorage::thread() {
return _thr_current;
}
inline void ThreadLocalStorage::set_thread(Thread* thread) {
_thr_current = thread;
}
And all the complex ThreadLocalStorage code with caching etc all vanishes!
For my next trick I plan to try and remove the ThreadLocalStorage class
completely by using language-based thread-locals on all platforms. But
for now this is just Solaris and so we still need the ThreadLocalStorage
API. However a lot of that API is not needed any more on Solaris so I
have excluded it from there in the shared code (ifndef SOLARIS). But to
avoid changing other shared-code callsites of ThreadLocalStorage I've
kept part of the API with trivial implementations on Solaris.
Testing: JPRT
All hotspot regression tests
I'm happy to run more tests but the nice thing about such low-level code
is that if it is broken, it is always broken :) Every use of
Thread::current or MacroAssembler::get_thread now hits this code.
Performance: I've run a basic set of benchmarks that is readily
available to me on our performance testing system. The best way to
describe the result is neutral. There are some slight wins, and some
slight losses, with most showing no statistical difference. And even the
"wins" and "losses" are within the natural variations of the benchmarks.
So a lot of complex code has been replaced by simple code and we haven't
lost any observable performance - which seems like a win to me.
Also product mode x64 libjvm.so has shrunk by 921KB - which is a little
surprising but very nice.
Thanks,
David
More information about the hotspot-runtime-dev
mailing list