RFR: 8130212: Thread::current() might access freed memory on Solaris
David Holmes
david.holmes at oracle.com
Thu Jul 30 04:40:05 UTC 2015
Hi Vladimir,
On 30/07/2015 9:45 AM, Vladimir Kozlov wrote:
> David,
>
> I don't see changes in macroAssembler_sparc.cpp for get_thread().
There's no need it already simply calls ThreadLocalStorage::thread (with
an indirection for debug builds)
> Did you look what assembler is generated for thread() method?:
>
> inline Thread* ThreadLocalStorage::thread() {
> return _thr_current;
> }
In the object file it generates code of the form (for x86):
leaq +0x0(%rip),%rdi
followed by a call that is translated as a call to _tls_get_addr. But
there is a lot of link-edit time manipulation of the code. I don't know
how to find the exact runtime code.
>
> It would be nice to avoid runtime call in assembler if we know address
> of _thr_current.
But the address itself would be a per-thread variable that we'd need to
get from somewhere ??
Thanks,
David
>
> Thanks,
> Vladimir
>
> On 7/29/15 3:38 PM, Christian Thalinger wrote:
>>
>>> On Jul 29, 2015, at 3:25 PM, David Holmes <david.holmes at oracle.com>
>>> wrote:
>>>
>>> <replying on hotspot-dev>
>>>
>>> On 30/07/2015 1:28 AM, Christian Thalinger wrote:
>>>> Now that threadLS_solaris_sparc.{cpp,hpp} and
>>>> threadLS_solaris_x86.{cpp,hop} look exactly the same it would be
>>>> nice to merge them into threadLS_solaris.{cpp,hpp}.
>>>
>>> In the next phase:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>>
>>> these files will disappear completely. Can this wait till then?
>>
>> Yes.
>>
>>>
>>> Thanks,
>>> David
>>>
>>>>> On Jul 28, 2015, at 10:56 PM, David Holmes
>>>>> <david.holmes at oracle.com> wrote:
>>>>>
>>>>> Moved to hotspot-dev so the compiler folk also see this for the
>>>>> MacroAssembler changes.
>>>>>
>>>>> David
>>>>>
>>>>> On 29/07/2015 3:53 PM, David Holmes wrote:
>>>>>> I forgot to credit Dave Dice with the suggestion to modernize this
>>>>>> code.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>>>>>> Summary: replace complex custom code for maintaining
>>>>>>> ThreadLocalStorage
>>>>>>> with compiler supported thread-local variables (Solaris only)
>>>>>>>
>>>>>>> This is a non-public bug so let me explain with some background, the
>>>>>>> bug, and then the fix - which involves lots of complex-code
>>>>>>> deletion and
>>>>>>> addition of some very simple code. :)
>>>>>>>
>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>>>>>
>>>>>>> In various parts of the runtime and in compiler generated code we
>>>>>>> need
>>>>>>> to get a reference to the (VM-level) Thread* of the currently
>>>>>>> executing
>>>>>>> thread. This is what Thread::current() returns. For performance
>>>>>>> reasons
>>>>>>> we also have a fast-path on 64-bit where the Thread* is stashed
>>>>>>> away in
>>>>>>> a register (g7 on sparc, r15 on x64).
>>>>>>>
>>>>>>> So Thread::current() is actually a slow-path mechanism and it
>>>>>>> delegates
>>>>>>> to ThreadLocalStorage::thread().
>>>>>>>
>>>>>>> On some systems ThreadLocalStorage::thread utilizes a caching
>>>>>>> mechanism
>>>>>>> to try and speed up access to the current thread. Otherwise it calls
>>>>>>> into yet another "slow" path which uses the available platform
>>>>>>> thread-specific-storage APIs.
>>>>>>>
>>>>>>> Compiled code also has a slow-path get_thread() method which uses
>>>>>>> assembly code to invoke the same platform thread-specific-storage
>>>>>>> APIs
>>>>>>> (in some cases - on sparc it simply calls
>>>>>>> ThreadLocalStorage::thread()).
>>>>>>>
>>>>>>> On Solaris 64-bit (which is all we support today) there is a simple
>>>>>>> 1-level thread cache which is an array of Thread*. If a thread
>>>>>>> doesn't
>>>>>>> find itself in the slot for the hash of its id it inserts itself
>>>>>>> there.
>>>>>>> As a thread terminates it clears out its ThreadLocalStorage values
>>>>>>> including any cached reference.
>>>>>>>
>>>>>>> The bug is that we have potential for a read-after-free error due to
>>>>>>> this code:
>>>>>>>
>>>>>>> 46 uintptr_t raw = pd_raw_thread_id();
>>>>>>> 47 int ix = pd_cache_index(raw); // hashes id
>>>>>>> 48 Thread* candidate =
>>>>>>> ThreadLocalStorage::_get_thread_cache[ix];
>>>>>>> 49 if (candidate->self_raw_id() == raw) {
>>>>>>> 50 // hit
>>>>>>> 51 return candidate;
>>>>>>> 52 } else {
>>>>>>> 53 return
>>>>>>> ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>>>>>> ix);
>>>>>>> 54 }
>>>>>>>
>>>>>>> The problem is that the value read as candidate could be a thread
>>>>>>> that
>>>>>>> (after line 48) terminated and was freed. But line #49 then reads
>>>>>>> the
>>>>>>> raw id of that thread, which is then a read-after-free - which is
>>>>>>> a "Bad
>>>>>>> Thing (TM)".
>>>>>>>
>>>>>>> There's no simple fix for the caching code - you would need a
>>>>>>> completely
>>>>>>> different approach (or synchronization that would nullify the whole
>>>>>>> point of the cache).
>>>>>>>
>>>>>>> Now all this ThreadLocalStorage code is pretty old and was put in
>>>>>>> place
>>>>>>> to deal with inadequacies of the system provided
>>>>>>> thread-specific-storage
>>>>>>> API. In fact on Solaris we even by-pass the public API
>>>>>>> (thr_getspecific/thr_setspecific) when we can and implement our own
>>>>>>> version using lower-level APIs available in the T1/T2 threading
>>>>>>> libraries!
>>>>>>>
>>>>>>> In mid-2015 things have changed considerably and we have reliable
>>>>>>> and
>>>>>>> performant support for thread-local variables at the C+
>>>>>>> language-level.
>>>>>>> So the way to maintain the current thread is simply using:
>>>>>>>
>>>>>>> // Declaration of thread-local variable
>>>>>>> static __thread Thread * _thr_current
>>>>>>>
>>>>>>> inline Thread* ThreadLocalStorage::thread() {
>>>>>>> return _thr_current;
>>>>>>> }
>>>>>>>
>>>>>>> inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>>>>> _thr_current = thread;
>>>>>>> }
>>>>>>>
>>>>>>> And all the complex ThreadLocalStorage code with caching etc all
>>>>>>> vanishes!
>>>>>>>
>>>>>>> For my next trick I plan to try and remove the ThreadLocalStorage
>>>>>>> class
>>>>>>> completely by using language-based thread-locals on all
>>>>>>> platforms. But
>>>>>>> for now this is just Solaris and so we still need the
>>>>>>> ThreadLocalStorage
>>>>>>> API. However a lot of that API is not needed any more on Solaris
>>>>>>> so I
>>>>>>> have excluded it from there in the shared code (ifndef SOLARIS).
>>>>>>> But to
>>>>>>> avoid changing other shared-code callsites of ThreadLocalStorage
>>>>>>> I've
>>>>>>> kept part of the API with trivial implementations on Solaris.
>>>>>>>
>>>>>>> Testing: JPRT
>>>>>>> All hotspot regression tests
>>>>>>>
>>>>>>> I'm happy to run more tests but the nice thing about such
>>>>>>> low-level code
>>>>>>> is that if it is broken, it is always broken :) Every use of
>>>>>>> Thread::current or MacroAssembler::get_thread now hits this code.
>>>>>>>
>>>>>>> Performance: I've run a basic set of benchmarks that is readily
>>>>>>> available to me on our performance testing system. The best way to
>>>>>>> describe the result is neutral. There are some slight wins, and some
>>>>>>> slight losses, with most showing no statistical difference. And
>>>>>>> even the
>>>>>>> "wins" and "losses" are within the natural variations of the
>>>>>>> benchmarks.
>>>>>>> So a lot of complex code has been replaced by simple code and we
>>>>>>> haven't
>>>>>>> lost any observable performance - which seems like a win to me.
>>>>>>>
>>>>>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a
>>>>>>> little
>>>>>>> surprising but very nice.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>
>>
More information about the hotspot-dev
mailing list