RFR: 8130212: Thread::current() might access freed memory on Solaris

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Jul 30 18:30:00 UTC 2015


On 7/29/15 9:40 PM, David Holmes wrote:
> Hi Vladimir,
>
> On 30/07/2015 9:45 AM, Vladimir Kozlov wrote:
>> David,
>>
>> I don't see changes in macroAssembler_sparc.cpp for get_thread().
>
> There's no need it already simply calls ThreadLocalStorage::thread (with an indirection for debug builds)

You are right.

>
>> Did you look what assembler is generated for thread() method?:
>>
>> inline Thread* ThreadLocalStorage::thread()  {
>>    return _thr_current;
>> }
>
> In the object file it generates code of the form (for x86):
>
>   leaq   +0x0(%rip),%rdi
>
> followed by a call that is translated as a call to _tls_get_addr. But there is a lot of link-edit time manipulation of
> the code. I don't know how to find the exact runtime code.

Yes, I realized it will be difficult to reproduce what C++ compiler and liker will do.

>
>>
>> It would be nice to avoid runtime call in assembler if we know address
>> of _thr_current.
>
> But the address itself would be a per-thread variable that we'd need to get from somewhere ??

Yes, yes, it is true. And we want to avoid any tables which led to this problem.
Okay, since it is slow path we can do call.

Last thing. Can you preserve separate versions for 32- and 64-bits?
Sometimes I have to build 32-bit Sol-x86 for debugging with dbx.

Thanks,
Vladimir

>
> Thanks,
> David
>
>>
>> Thanks,
>> Vladimir
>>
>> On 7/29/15 3:38 PM, Christian Thalinger wrote:
>>>
>>>> On Jul 29, 2015, at 3:25 PM, David Holmes <david.holmes at oracle.com>
>>>> wrote:
>>>>
>>>> <replying on hotspot-dev>
>>>>
>>>> On 30/07/2015 1:28 AM, Christian Thalinger wrote:
>>>>> Now that threadLS_solaris_sparc.{cpp,hpp} and
>>>>> threadLS_solaris_x86.{cpp,hop} look exactly the same it would be
>>>>> nice to merge them into threadLS_solaris.{cpp,hpp}.
>>>>
>>>> In the next phase:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>
>>>> these files will disappear completely. Can this wait till then?
>>>
>>> Yes.
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>> On Jul 28, 2015, at 10:56 PM, David Holmes
>>>>>> <david.holmes at oracle.com> wrote:
>>>>>>
>>>>>> Moved to hotspot-dev so the compiler folk also see this for the
>>>>>> MacroAssembler changes.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 29/07/2015 3:53 PM, David Holmes wrote:
>>>>>>> I forgot to credit Dave Dice with the suggestion to modernize this
>>>>>>> code.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>>>>>>> Summary: replace complex custom code for maintaining
>>>>>>>> ThreadLocalStorage
>>>>>>>> with compiler supported thread-local variables (Solaris only)
>>>>>>>>
>>>>>>>> This is a non-public bug so let me explain with some background, the
>>>>>>>> bug, and then the fix - which involves lots of complex-code
>>>>>>>> deletion and
>>>>>>>> addition of some very simple code. :)
>>>>>>>>
>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>>>>>>
>>>>>>>> In various parts of the runtime and in compiler generated code we
>>>>>>>> need
>>>>>>>> to get a reference to the (VM-level) Thread* of the currently
>>>>>>>> executing
>>>>>>>> thread. This is what Thread::current() returns. For performance
>>>>>>>> reasons
>>>>>>>> we also have a fast-path on 64-bit where the Thread* is stashed
>>>>>>>> away in
>>>>>>>> a register (g7 on sparc, r15 on x64).
>>>>>>>>
>>>>>>>> So Thread::current() is actually a slow-path mechanism and it
>>>>>>>> delegates
>>>>>>>> to ThreadLocalStorage::thread().
>>>>>>>>
>>>>>>>> On some systems ThreadLocalStorage::thread utilizes a caching
>>>>>>>> mechanism
>>>>>>>> to try and speed up access to the current thread. Otherwise it calls
>>>>>>>> into yet another "slow" path which uses the available platform
>>>>>>>> thread-specific-storage APIs.
>>>>>>>>
>>>>>>>> Compiled code also has a slow-path get_thread() method which uses
>>>>>>>> assembly code to invoke the same platform thread-specific-storage
>>>>>>>> APIs
>>>>>>>> (in some cases - on sparc it simply calls
>>>>>>>> ThreadLocalStorage::thread()).
>>>>>>>>
>>>>>>>> On Solaris 64-bit (which is all we support today) there is a simple
>>>>>>>> 1-level thread cache which is an array of Thread*. If a thread
>>>>>>>> doesn't
>>>>>>>> find itself in the slot for the hash of its id it inserts itself
>>>>>>>> there.
>>>>>>>> As a thread terminates it clears out its ThreadLocalStorage values
>>>>>>>> including any cached reference.
>>>>>>>>
>>>>>>>> The bug is that we have potential for a read-after-free error due to
>>>>>>>> this code:
>>>>>>>>
>>>>>>>>    46   uintptr_t raw = pd_raw_thread_id();
>>>>>>>>    47   int ix = pd_cache_index(raw);  // hashes id
>>>>>>>>    48   Thread* candidate =
>>>>>>>> ThreadLocalStorage::_get_thread_cache[ix];
>>>>>>>>    49   if (candidate->self_raw_id() == raw) {
>>>>>>>>    50     // hit
>>>>>>>>    51     return candidate;
>>>>>>>>    52   } else {
>>>>>>>>    53     return
>>>>>>>> ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>>>>>>> ix);
>>>>>>>>    54   }
>>>>>>>>
>>>>>>>> The problem is that the value read as candidate could be a thread
>>>>>>>> that
>>>>>>>> (after line 48) terminated and was freed. But line #49 then reads
>>>>>>>> the
>>>>>>>> raw id of that thread, which is then a read-after-free - which is
>>>>>>>> a "Bad
>>>>>>>> Thing (TM)".
>>>>>>>>
>>>>>>>> There's no simple fix for the caching code - you would need a
>>>>>>>> completely
>>>>>>>> different approach (or synchronization that would nullify the whole
>>>>>>>> point of the cache).
>>>>>>>>
>>>>>>>> Now all this ThreadLocalStorage code is pretty old and was put in
>>>>>>>> place
>>>>>>>> to deal with inadequacies of the system provided
>>>>>>>> thread-specific-storage
>>>>>>>> API. In fact on Solaris we even by-pass the public API
>>>>>>>> (thr_getspecific/thr_setspecific) when we can and implement our own
>>>>>>>> version using lower-level APIs available in the T1/T2 threading
>>>>>>>> libraries!
>>>>>>>>
>>>>>>>> In mid-2015 things have changed considerably and we have reliable
>>>>>>>> and
>>>>>>>> performant support for thread-local variables at the C+
>>>>>>>> language-level.
>>>>>>>> So the way to maintain the current thread is simply using:
>>>>>>>>
>>>>>>>>   // Declaration of thread-local variable
>>>>>>>>   static __thread Thread * _thr_current
>>>>>>>>
>>>>>>>>   inline Thread* ThreadLocalStorage::thread()  {
>>>>>>>>     return _thr_current;
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>>>>>>     _thr_current = thread;
>>>>>>>>   }
>>>>>>>>
>>>>>>>> And all the complex ThreadLocalStorage code with caching etc all
>>>>>>>> vanishes!
>>>>>>>>
>>>>>>>> For my next trick I plan to try and remove the ThreadLocalStorage
>>>>>>>> class
>>>>>>>> completely by using language-based thread-locals on all
>>>>>>>> platforms. But
>>>>>>>> for now this is just Solaris and so we still need the
>>>>>>>> ThreadLocalStorage
>>>>>>>> API. However a lot of that API is not needed any more on Solaris
>>>>>>>> so I
>>>>>>>> have excluded it from there in the shared code (ifndef SOLARIS).
>>>>>>>> But to
>>>>>>>> avoid changing other shared-code callsites of ThreadLocalStorage
>>>>>>>> I've
>>>>>>>> kept part of the API with trivial implementations on Solaris.
>>>>>>>>
>>>>>>>> Testing: JPRT
>>>>>>>>           All hotspot regression tests
>>>>>>>>
>>>>>>>> I'm happy to run more tests but the nice thing about such
>>>>>>>> low-level code
>>>>>>>> is that if it is broken, it is always broken :) Every use of
>>>>>>>> Thread::current or MacroAssembler::get_thread now hits this code.
>>>>>>>>
>>>>>>>> Performance: I've run a basic set of benchmarks that is readily
>>>>>>>> available to me on our performance testing system. The best way to
>>>>>>>> describe the result is neutral. There are some slight wins, and some
>>>>>>>> slight losses, with most showing no statistical difference. And
>>>>>>>> even the
>>>>>>>> "wins" and "losses" are within the natural variations of the
>>>>>>>> benchmarks.
>>>>>>>> So a lot of complex code has been replaced by simple code and we
>>>>>>>> haven't
>>>>>>>> lost any observable performance - which seems like a win to me.
>>>>>>>>
>>>>>>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a
>>>>>>>> little
>>>>>>>> surprising but very nice.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>
>>>


More information about the hotspot-dev mailing list