RFR: 8130212: Thread::current() might access freed memory on Solaris

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Jul 29 23:45:47 UTC 2015


David,

I don't see changes in macroAssembler_sparc.cpp for get_thread().

Did you look what assembler is generated for thread() method?:

inline Thread* ThreadLocalStorage::thread()  {
   return _thr_current;
}

It would be nice to avoid runtime call in assembler if we know address 
of _thr_current.

Thanks,
Vladimir

On 7/29/15 3:38 PM, Christian Thalinger wrote:
>
>> On Jul 29, 2015, at 3:25 PM, David Holmes <david.holmes at oracle.com> wrote:
>>
>> <replying on hotspot-dev>
>>
>> On 30/07/2015 1:28 AM, Christian Thalinger wrote:
>>> Now that threadLS_solaris_sparc.{cpp,hpp} and threadLS_solaris_x86.{cpp,hop} look exactly the same it would be nice to merge them into threadLS_solaris.{cpp,hpp}.
>>
>> In the next phase:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>
>> these files will disappear completely. Can this wait till then?
>
> Yes.
>
>>
>> Thanks,
>> David
>>
>>>> On Jul 28, 2015, at 10:56 PM, David Holmes <david.holmes at oracle.com> wrote:
>>>>
>>>> Moved to hotspot-dev so the compiler folk also see this for the MacroAssembler changes.
>>>>
>>>> David
>>>>
>>>> On 29/07/2015 3:53 PM, David Holmes wrote:
>>>>> I forgot to credit Dave Dice with the suggestion to modernize this code.
>>>>>
>>>>> David
>>>>>
>>>>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>>>>> Summary: replace complex custom code for maintaining ThreadLocalStorage
>>>>>> with compiler supported thread-local variables (Solaris only)
>>>>>>
>>>>>> This is a non-public bug so let me explain with some background, the
>>>>>> bug, and then the fix - which involves lots of complex-code deletion and
>>>>>> addition of some very simple code. :)
>>>>>>
>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>>>>
>>>>>> In various parts of the runtime and in compiler generated code we need
>>>>>> to get a reference to the (VM-level) Thread* of the currently executing
>>>>>> thread. This is what Thread::current() returns. For performance reasons
>>>>>> we also have a fast-path on 64-bit where the Thread* is stashed away in
>>>>>> a register (g7 on sparc, r15 on x64).
>>>>>>
>>>>>> So Thread::current() is actually a slow-path mechanism and it delegates
>>>>>> to ThreadLocalStorage::thread().
>>>>>>
>>>>>> On some systems ThreadLocalStorage::thread utilizes a caching mechanism
>>>>>> to try and speed up access to the current thread. Otherwise it calls
>>>>>> into yet another "slow" path which uses the available platform
>>>>>> thread-specific-storage APIs.
>>>>>>
>>>>>> Compiled code also has a slow-path get_thread() method which uses
>>>>>> assembly code to invoke the same platform thread-specific-storage APIs
>>>>>> (in some cases - on sparc it simply calls ThreadLocalStorage::thread()).
>>>>>>
>>>>>> On Solaris 64-bit (which is all we support today) there is a simple
>>>>>> 1-level thread cache which is an array of Thread*. If a thread doesn't
>>>>>> find itself in the slot for the hash of its id it inserts itself there.
>>>>>> As a thread terminates it clears out its ThreadLocalStorage values
>>>>>> including any cached reference.
>>>>>>
>>>>>> The bug is that we have potential for a read-after-free error due to
>>>>>> this code:
>>>>>>
>>>>>>    46   uintptr_t raw = pd_raw_thread_id();
>>>>>>    47   int ix = pd_cache_index(raw);  // hashes id
>>>>>>    48   Thread* candidate = ThreadLocalStorage::_get_thread_cache[ix];
>>>>>>    49   if (candidate->self_raw_id() == raw) {
>>>>>>    50     // hit
>>>>>>    51     return candidate;
>>>>>>    52   } else {
>>>>>>    53     return ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>>>>> ix);
>>>>>>    54   }
>>>>>>
>>>>>> The problem is that the value read as candidate could be a thread that
>>>>>> (after line 48) terminated and was freed. But line #49 then reads the
>>>>>> raw id of that thread, which is then a read-after-free - which is a "Bad
>>>>>> Thing (TM)".
>>>>>>
>>>>>> There's no simple fix for the caching code - you would need a completely
>>>>>> different approach (or synchronization that would nullify the whole
>>>>>> point of the cache).
>>>>>>
>>>>>> Now all this ThreadLocalStorage code is pretty old and was put in place
>>>>>> to deal with inadequacies of the system provided thread-specific-storage
>>>>>> API. In fact on Solaris we even by-pass the public API
>>>>>> (thr_getspecific/thr_setspecific) when we can and implement our own
>>>>>> version using lower-level APIs available in the T1/T2 threading
>>>>>> libraries!
>>>>>>
>>>>>> In mid-2015 things have changed considerably and we have reliable and
>>>>>> performant support for thread-local variables at the C+ language-level.
>>>>>> So the way to maintain the current thread is simply using:
>>>>>>
>>>>>>   // Declaration of thread-local variable
>>>>>>   static __thread Thread * _thr_current
>>>>>>
>>>>>>   inline Thread* ThreadLocalStorage::thread()  {
>>>>>>     return _thr_current;
>>>>>>   }
>>>>>>
>>>>>>   inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>>>>     _thr_current = thread;
>>>>>>   }
>>>>>>
>>>>>> And all the complex ThreadLocalStorage code with caching etc all
>>>>>> vanishes!
>>>>>>
>>>>>> For my next trick I plan to try and remove the ThreadLocalStorage class
>>>>>> completely by using language-based thread-locals on all platforms. But
>>>>>> for now this is just Solaris and so we still need the ThreadLocalStorage
>>>>>> API. However a lot of that API is not needed any more on Solaris so I
>>>>>> have excluded it from there in the shared code (ifndef SOLARIS). But to
>>>>>> avoid changing other shared-code callsites of ThreadLocalStorage I've
>>>>>> kept part of the API with trivial implementations on Solaris.
>>>>>>
>>>>>> Testing: JPRT
>>>>>>           All hotspot regression tests
>>>>>>
>>>>>> I'm happy to run more tests but the nice thing about such low-level code
>>>>>> is that if it is broken, it is always broken :) Every use of
>>>>>> Thread::current or MacroAssembler::get_thread now hits this code.
>>>>>>
>>>>>> Performance: I've run a basic set of benchmarks that is readily
>>>>>> available to me on our performance testing system. The best way to
>>>>>> describe the result is neutral. There are some slight wins, and some
>>>>>> slight losses, with most showing no statistical difference. And even the
>>>>>> "wins" and "losses" are within the natural variations of the benchmarks.
>>>>>> So a lot of complex code has been replaced by simple code and we haven't
>>>>>> lost any observable performance - which seems like a win to me.
>>>>>>
>>>>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a little
>>>>>> surprising but very nice.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>
>


More information about the hotspot-dev mailing list