RFR: 8130212: Thread::current() might access freed memory on Solaris

David Holmes david.holmes at oracle.com
Thu Jul 30 21:59:25 UTC 2015


Hi Vladimir,

On 31/07/2015 4:30 AM, Vladimir Kozlov wrote:
> On 7/29/15 9:40 PM, David Holmes wrote:
>> Hi Vladimir,
>>
>> On 30/07/2015 9:45 AM, Vladimir Kozlov wrote:
>>> David,
>>>
>>> I don't see changes in macroAssembler_sparc.cpp for get_thread().
>>
>> There's no need it already simply calls ThreadLocalStorage::thread
>> (with an indirection for debug builds)
>
> You are right.
>
>>
>>> Did you look what assembler is generated for thread() method?:
>>>
>>> inline Thread* ThreadLocalStorage::thread()  {
>>>    return _thr_current;
>>> }
>>
>> In the object file it generates code of the form (for x86):
>>
>>   leaq   +0x0(%rip),%rdi
>>
>> followed by a call that is translated as a call to _tls_get_addr. But
>> there is a lot of link-edit time manipulation of
>> the code. I don't know how to find the exact runtime code.
>
> Yes, I realized it will be difficult to reproduce what C++ compiler and
> liker will do.
>
>>
>>>
>>> It would be nice to avoid runtime call in assembler if we know address
>>> of _thr_current.
>>
>> But the address itself would be a per-thread variable that we'd need
>> to get from somewhere ??
>
> Yes, yes, it is true. And we want to avoid any tables which led to this
> problem.
> Okay, since it is slow path we can do call.
>
> Last thing. Can you preserve separate versions for 32- and 64-bits?
> Sometimes I have to build 32-bit Sol-x86 for debugging with dbx.

I was under the impression that 32-bit Solaris support had been flagged 
for deletion.

I can add a 32-bit version of the change but am not sure how to test it. 
Can we still easily build 32-bit Solaris or do I need to hack the build 
systems somehow?

Thanks,
David

> Thanks,
> Vladimir
>
>>
>> Thanks,
>> David
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/29/15 3:38 PM, Christian Thalinger wrote:
>>>>
>>>>> On Jul 29, 2015, at 3:25 PM, David Holmes <david.holmes at oracle.com>
>>>>> wrote:
>>>>>
>>>>> <replying on hotspot-dev>
>>>>>
>>>>> On 30/07/2015 1:28 AM, Christian Thalinger wrote:
>>>>>> Now that threadLS_solaris_sparc.{cpp,hpp} and
>>>>>> threadLS_solaris_x86.{cpp,hop} look exactly the same it would be
>>>>>> nice to merge them into threadLS_solaris.{cpp,hpp}.
>>>>>
>>>>> In the next phase:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>
>>>>> these files will disappear completely. Can this wait till then?
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>>> On Jul 28, 2015, at 10:56 PM, David Holmes
>>>>>>> <david.holmes at oracle.com> wrote:
>>>>>>>
>>>>>>> Moved to hotspot-dev so the compiler folk also see this for the
>>>>>>> MacroAssembler changes.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On 29/07/2015 3:53 PM, David Holmes wrote:
>>>>>>>> I forgot to credit Dave Dice with the suggestion to modernize this
>>>>>>>> code.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>>>>>>>> Summary: replace complex custom code for maintaining
>>>>>>>>> ThreadLocalStorage
>>>>>>>>> with compiler supported thread-local variables (Solaris only)
>>>>>>>>>
>>>>>>>>> This is a non-public bug so let me explain with some
>>>>>>>>> background, the
>>>>>>>>> bug, and then the fix - which involves lots of complex-code
>>>>>>>>> deletion and
>>>>>>>>> addition of some very simple code. :)
>>>>>>>>>
>>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>>>>>>>
>>>>>>>>> In various parts of the runtime and in compiler generated code we
>>>>>>>>> need
>>>>>>>>> to get a reference to the (VM-level) Thread* of the currently
>>>>>>>>> executing
>>>>>>>>> thread. This is what Thread::current() returns. For performance
>>>>>>>>> reasons
>>>>>>>>> we also have a fast-path on 64-bit where the Thread* is stashed
>>>>>>>>> away in
>>>>>>>>> a register (g7 on sparc, r15 on x64).
>>>>>>>>>
>>>>>>>>> So Thread::current() is actually a slow-path mechanism and it
>>>>>>>>> delegates
>>>>>>>>> to ThreadLocalStorage::thread().
>>>>>>>>>
>>>>>>>>> On some systems ThreadLocalStorage::thread utilizes a caching
>>>>>>>>> mechanism
>>>>>>>>> to try and speed up access to the current thread. Otherwise it
>>>>>>>>> calls
>>>>>>>>> into yet another "slow" path which uses the available platform
>>>>>>>>> thread-specific-storage APIs.
>>>>>>>>>
>>>>>>>>> Compiled code also has a slow-path get_thread() method which uses
>>>>>>>>> assembly code to invoke the same platform thread-specific-storage
>>>>>>>>> APIs
>>>>>>>>> (in some cases - on sparc it simply calls
>>>>>>>>> ThreadLocalStorage::thread()).
>>>>>>>>>
>>>>>>>>> On Solaris 64-bit (which is all we support today) there is a
>>>>>>>>> simple
>>>>>>>>> 1-level thread cache which is an array of Thread*. If a thread
>>>>>>>>> doesn't
>>>>>>>>> find itself in the slot for the hash of its id it inserts itself
>>>>>>>>> there.
>>>>>>>>> As a thread terminates it clears out its ThreadLocalStorage values
>>>>>>>>> including any cached reference.
>>>>>>>>>
>>>>>>>>> The bug is that we have potential for a read-after-free error
>>>>>>>>> due to
>>>>>>>>> this code:
>>>>>>>>>
>>>>>>>>>    46   uintptr_t raw = pd_raw_thread_id();
>>>>>>>>>    47   int ix = pd_cache_index(raw);  // hashes id
>>>>>>>>>    48   Thread* candidate =
>>>>>>>>> ThreadLocalStorage::_get_thread_cache[ix];
>>>>>>>>>    49   if (candidate->self_raw_id() == raw) {
>>>>>>>>>    50     // hit
>>>>>>>>>    51     return candidate;
>>>>>>>>>    52   } else {
>>>>>>>>>    53     return
>>>>>>>>> ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>>>>>>>> ix);
>>>>>>>>>    54   }
>>>>>>>>>
>>>>>>>>> The problem is that the value read as candidate could be a thread
>>>>>>>>> that
>>>>>>>>> (after line 48) terminated and was freed. But line #49 then reads
>>>>>>>>> the
>>>>>>>>> raw id of that thread, which is then a read-after-free - which is
>>>>>>>>> a "Bad
>>>>>>>>> Thing (TM)".
>>>>>>>>>
>>>>>>>>> There's no simple fix for the caching code - you would need a
>>>>>>>>> completely
>>>>>>>>> different approach (or synchronization that would nullify the
>>>>>>>>> whole
>>>>>>>>> point of the cache).
>>>>>>>>>
>>>>>>>>> Now all this ThreadLocalStorage code is pretty old and was put in
>>>>>>>>> place
>>>>>>>>> to deal with inadequacies of the system provided
>>>>>>>>> thread-specific-storage
>>>>>>>>> API. In fact on Solaris we even by-pass the public API
>>>>>>>>> (thr_getspecific/thr_setspecific) when we can and implement our
>>>>>>>>> own
>>>>>>>>> version using lower-level APIs available in the T1/T2 threading
>>>>>>>>> libraries!
>>>>>>>>>
>>>>>>>>> In mid-2015 things have changed considerably and we have reliable
>>>>>>>>> and
>>>>>>>>> performant support for thread-local variables at the C+
>>>>>>>>> language-level.
>>>>>>>>> So the way to maintain the current thread is simply using:
>>>>>>>>>
>>>>>>>>>   // Declaration of thread-local variable
>>>>>>>>>   static __thread Thread * _thr_current
>>>>>>>>>
>>>>>>>>>   inline Thread* ThreadLocalStorage::thread()  {
>>>>>>>>>     return _thr_current;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>   inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>>>>>>>     _thr_current = thread;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> And all the complex ThreadLocalStorage code with caching etc all
>>>>>>>>> vanishes!
>>>>>>>>>
>>>>>>>>> For my next trick I plan to try and remove the ThreadLocalStorage
>>>>>>>>> class
>>>>>>>>> completely by using language-based thread-locals on all
>>>>>>>>> platforms. But
>>>>>>>>> for now this is just Solaris and so we still need the
>>>>>>>>> ThreadLocalStorage
>>>>>>>>> API. However a lot of that API is not needed any more on Solaris
>>>>>>>>> so I
>>>>>>>>> have excluded it from there in the shared code (ifndef SOLARIS).
>>>>>>>>> But to
>>>>>>>>> avoid changing other shared-code callsites of ThreadLocalStorage
>>>>>>>>> I've
>>>>>>>>> kept part of the API with trivial implementations on Solaris.
>>>>>>>>>
>>>>>>>>> Testing: JPRT
>>>>>>>>>           All hotspot regression tests
>>>>>>>>>
>>>>>>>>> I'm happy to run more tests but the nice thing about such
>>>>>>>>> low-level code
>>>>>>>>> is that if it is broken, it is always broken :) Every use of
>>>>>>>>> Thread::current or MacroAssembler::get_thread now hits this code.
>>>>>>>>>
>>>>>>>>> Performance: I've run a basic set of benchmarks that is readily
>>>>>>>>> available to me on our performance testing system. The best way to
>>>>>>>>> describe the result is neutral. There are some slight wins, and
>>>>>>>>> some
>>>>>>>>> slight losses, with most showing no statistical difference. And
>>>>>>>>> even the
>>>>>>>>> "wins" and "losses" are within the natural variations of the
>>>>>>>>> benchmarks.
>>>>>>>>> So a lot of complex code has been replaced by simple code and we
>>>>>>>>> haven't
>>>>>>>>> lost any observable performance - which seems like a win to me.
>>>>>>>>>
>>>>>>>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a
>>>>>>>>> little
>>>>>>>>> surprising but very nice.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>
>>>>


More information about the hotspot-dev mailing list