RFR: 8130212: Thread::current() might access freed memory on Solaris

David Holmes david.holmes at oracle.com
Wed Jul 29 20:40:21 UTC 2015


On 29/07/2015 3:56 PM, David Holmes wrote:
> Moved to hotspot-dev so the compiler folk also see this for the
> MacroAssembler changes.

Really sending to hotspot-dev this time :(

David

> David
>
> On 29/07/2015 3:53 PM, David Holmes wrote:
>> I forgot to credit Dave Dice with the suggestion to modernize this code.
>>
>> David
>>
>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>> Summary: replace complex custom code for maintaining ThreadLocalStorage
>>> with compiler supported thread-local variables (Solaris only)
>>>
>>> This is a non-public bug so let me explain with some background, the
>>> bug, and then the fix - which involves lots of complex-code deletion and
>>> addition of some very simple code. :)
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>
>>> In various parts of the runtime and in compiler generated code we need
>>> to get a reference to the (VM-level) Thread* of the currently executing
>>> thread. This is what Thread::current() returns. For performance reasons
>>> we also have a fast-path on 64-bit where the Thread* is stashed away in
>>> a register (g7 on sparc, r15 on x64).
>>>
>>> So Thread::current() is actually a slow-path mechanism and it delegates
>>> to ThreadLocalStorage::thread().
>>>
>>> On some systems ThreadLocalStorage::thread utilizes a caching mechanism
>>> to try and speed up access to the current thread. Otherwise it calls
>>> into yet another "slow" path which uses the available platform
>>> thread-specific-storage APIs.
>>>
>>> Compiled code also has a slow-path get_thread() method which uses
>>> assembly code to invoke the same platform thread-specific-storage APIs
>>> (in some cases - on sparc it simply calls ThreadLocalStorage::thread()).
>>>
>>> On Solaris 64-bit (which is all we support today) there is a simple
>>> 1-level thread cache which is an array of Thread*. If a thread doesn't
>>> find itself in the slot for the hash of its id it inserts itself there.
>>> As a thread terminates it clears out its ThreadLocalStorage values
>>> including any cached reference.
>>>
>>> The bug is that we have potential for a read-after-free error due to
>>> this code:
>>>
>>>    46   uintptr_t raw = pd_raw_thread_id();
>>>    47   int ix = pd_cache_index(raw);  // hashes id
>>>    48   Thread* candidate = ThreadLocalStorage::_get_thread_cache[ix];
>>>    49   if (candidate->self_raw_id() == raw) {
>>>    50     // hit
>>>    51     return candidate;
>>>    52   } else {
>>>    53     return ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>> ix);
>>>    54   }
>>>
>>> The problem is that the value read as candidate could be a thread that
>>> (after line 48) terminated and was freed. But line #49 then reads the
>>> raw id of that thread, which is then a read-after-free - which is a "Bad
>>> Thing (TM)".
>>>
>>> There's no simple fix for the caching code - you would need a completely
>>> different approach (or synchronization that would nullify the whole
>>> point of the cache).
>>>
>>> Now all this ThreadLocalStorage code is pretty old and was put in place
>>> to deal with inadequacies of the system provided thread-specific-storage
>>> API. In fact on Solaris we even by-pass the public API
>>> (thr_getspecific/thr_setspecific) when we can and implement our own
>>> version using lower-level APIs available in the T1/T2 threading
>>> libraries!
>>>
>>> In mid-2015 things have changed considerably and we have reliable and
>>> performant support for thread-local variables at the C+ language-level.
>>> So the way to maintain the current thread is simply using:
>>>
>>>   // Declaration of thread-local variable
>>>   static __thread Thread * _thr_current
>>>
>>>   inline Thread* ThreadLocalStorage::thread()  {
>>>     return _thr_current;
>>>   }
>>>
>>>   inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>     _thr_current = thread;
>>>   }
>>>
>>> And all the complex ThreadLocalStorage code with caching etc all
>>> vanishes!
>>>
>>> For my next trick I plan to try and remove the ThreadLocalStorage class
>>> completely by using language-based thread-locals on all platforms. But
>>> for now this is just Solaris and so we still need the ThreadLocalStorage
>>> API. However a lot of that API is not needed any more on Solaris so I
>>> have excluded it from there in the shared code (ifndef SOLARIS). But to
>>> avoid changing other shared-code callsites of ThreadLocalStorage I've
>>> kept part of the API with trivial implementations on Solaris.
>>>
>>> Testing: JPRT
>>>           All hotspot regression tests
>>>
>>> I'm happy to run more tests but the nice thing about such low-level code
>>> is that if it is broken, it is always broken :) Every use of
>>> Thread::current or MacroAssembler::get_thread now hits this code.
>>>
>>> Performance: I've run a basic set of benchmarks that is readily
>>> available to me on our performance testing system. The best way to
>>> describe the result is neutral. There are some slight wins, and some
>>> slight losses, with most showing no statistical difference. And even the
>>> "wins" and "losses" are within the natural variations of the benchmarks.
>>> So a lot of complex code has been replaced by simple code and we haven't
>>> lost any observable performance - which seems like a win to me.
>>>
>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a little
>>> surprising but very nice.
>>>
>>> Thanks,
>>> David


More information about the hotspot-dev mailing list