RFR: 8130212: Thread::current() might access freed memory on Solaris

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Jul 31 02:12:02 UTC 2015


 > I can add a 32-bit version of the change but am not sure how to test it.
 > Can we still easily build 32-bit Solaris or do I need to hack the build
 > systems somehow?

You have to build first 32-bit 8u60 (using 7u80, for example, as boot 
jdk). And then use it as boot jdk for jdk9 32-bit build. Both builds 
have to be configured with --with-target-bits=32.

And there are no way to do that in JPRT as I know. Only manually :(

If it is hard for you I will manage (will prepare patch later).

So I am fine with current code if you don't want to spend more time on it.

Thanks,
Vladimir

On 7/30/15 2:59 PM, David Holmes wrote:
> Hi Vladimir,
>
> On 31/07/2015 4:30 AM, Vladimir Kozlov wrote:
>> On 7/29/15 9:40 PM, David Holmes wrote:
>>> Hi Vladimir,
>>>
>>> On 30/07/2015 9:45 AM, Vladimir Kozlov wrote:
>>>> David,
>>>>
>>>> I don't see changes in macroAssembler_sparc.cpp for get_thread().
>>>
>>> There's no need it already simply calls ThreadLocalStorage::thread
>>> (with an indirection for debug builds)
>>
>> You are right.
>>
>>>
>>>> Did you look what assembler is generated for thread() method?:
>>>>
>>>> inline Thread* ThreadLocalStorage::thread()  {
>>>>    return _thr_current;
>>>> }
>>>
>>> In the object file it generates code of the form (for x86):
>>>
>>>   leaq   +0x0(%rip),%rdi
>>>
>>> followed by a call that is translated as a call to _tls_get_addr. But
>>> there is a lot of link-edit time manipulation of
>>> the code. I don't know how to find the exact runtime code.
>>
>> Yes, I realized it will be difficult to reproduce what C++ compiler and
>> liker will do.
>>
>>>
>>>>
>>>> It would be nice to avoid runtime call in assembler if we know address
>>>> of _thr_current.
>>>
>>> But the address itself would be a per-thread variable that we'd need
>>> to get from somewhere ??
>>
>> Yes, yes, it is true. And we want to avoid any tables which led to this
>> problem.
>> Okay, since it is slow path we can do call.
>>
>> Last thing. Can you preserve separate versions for 32- and 64-bits?
>> Sometimes I have to build 32-bit Sol-x86 for debugging with dbx.
>
> I was under the impression that 32-bit Solaris support had been flagged
> for deletion.
>
> I can add a 32-bit version of the change but am not sure how to test it.
> Can we still easily build 32-bit Solaris or do I need to hack the build
> systems somehow?
>
> Thanks,
> David
>
>> Thanks,
>> Vladimir
>>
>>>
>>> Thanks,
>>> David
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 7/29/15 3:38 PM, Christian Thalinger wrote:
>>>>>
>>>>>> On Jul 29, 2015, at 3:25 PM, David Holmes <david.holmes at oracle.com>
>>>>>> wrote:
>>>>>>
>>>>>> <replying on hotspot-dev>
>>>>>>
>>>>>> On 30/07/2015 1:28 AM, Christian Thalinger wrote:
>>>>>>> Now that threadLS_solaris_sparc.{cpp,hpp} and
>>>>>>> threadLS_solaris_x86.{cpp,hop} look exactly the same it would be
>>>>>>> nice to merge them into threadLS_solaris.{cpp,hpp}.
>>>>>>
>>>>>> In the next phase:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>
>>>>>> these files will disappear completely. Can this wait till then?
>>>>>
>>>>> Yes.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>>> On Jul 28, 2015, at 10:56 PM, David Holmes
>>>>>>>> <david.holmes at oracle.com> wrote:
>>>>>>>>
>>>>>>>> Moved to hotspot-dev so the compiler folk also see this for the
>>>>>>>> MacroAssembler changes.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 29/07/2015 3:53 PM, David Holmes wrote:
>>>>>>>>> I forgot to credit Dave Dice with the suggestion to modernize this
>>>>>>>>> code.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>>>>>>>>> Summary: replace complex custom code for maintaining
>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>> with compiler supported thread-local variables (Solaris only)
>>>>>>>>>>
>>>>>>>>>> This is a non-public bug so let me explain with some
>>>>>>>>>> background, the
>>>>>>>>>> bug, and then the fix - which involves lots of complex-code
>>>>>>>>>> deletion and
>>>>>>>>>> addition of some very simple code. :)
>>>>>>>>>>
>>>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>>>>>>>>
>>>>>>>>>> In various parts of the runtime and in compiler generated code we
>>>>>>>>>> need
>>>>>>>>>> to get a reference to the (VM-level) Thread* of the currently
>>>>>>>>>> executing
>>>>>>>>>> thread. This is what Thread::current() returns. For performance
>>>>>>>>>> reasons
>>>>>>>>>> we also have a fast-path on 64-bit where the Thread* is stashed
>>>>>>>>>> away in
>>>>>>>>>> a register (g7 on sparc, r15 on x64).
>>>>>>>>>>
>>>>>>>>>> So Thread::current() is actually a slow-path mechanism and it
>>>>>>>>>> delegates
>>>>>>>>>> to ThreadLocalStorage::thread().
>>>>>>>>>>
>>>>>>>>>> On some systems ThreadLocalStorage::thread utilizes a caching
>>>>>>>>>> mechanism
>>>>>>>>>> to try and speed up access to the current thread. Otherwise it
>>>>>>>>>> calls
>>>>>>>>>> into yet another "slow" path which uses the available platform
>>>>>>>>>> thread-specific-storage APIs.
>>>>>>>>>>
>>>>>>>>>> Compiled code also has a slow-path get_thread() method which uses
>>>>>>>>>> assembly code to invoke the same platform thread-specific-storage
>>>>>>>>>> APIs
>>>>>>>>>> (in some cases - on sparc it simply calls
>>>>>>>>>> ThreadLocalStorage::thread()).
>>>>>>>>>>
>>>>>>>>>> On Solaris 64-bit (which is all we support today) there is a
>>>>>>>>>> simple
>>>>>>>>>> 1-level thread cache which is an array of Thread*. If a thread
>>>>>>>>>> doesn't
>>>>>>>>>> find itself in the slot for the hash of its id it inserts itself
>>>>>>>>>> there.
>>>>>>>>>> As a thread terminates it clears out its ThreadLocalStorage
>>>>>>>>>> values
>>>>>>>>>> including any cached reference.
>>>>>>>>>>
>>>>>>>>>> The bug is that we have potential for a read-after-free error
>>>>>>>>>> due to
>>>>>>>>>> this code:
>>>>>>>>>>
>>>>>>>>>>    46   uintptr_t raw = pd_raw_thread_id();
>>>>>>>>>>    47   int ix = pd_cache_index(raw);  // hashes id
>>>>>>>>>>    48   Thread* candidate =
>>>>>>>>>> ThreadLocalStorage::_get_thread_cache[ix];
>>>>>>>>>>    49   if (candidate->self_raw_id() == raw) {
>>>>>>>>>>    50     // hit
>>>>>>>>>>    51     return candidate;
>>>>>>>>>>    52   } else {
>>>>>>>>>>    53     return
>>>>>>>>>> ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>>>>>>>>> ix);
>>>>>>>>>>    54   }
>>>>>>>>>>
>>>>>>>>>> The problem is that the value read as candidate could be a thread
>>>>>>>>>> that
>>>>>>>>>> (after line 48) terminated and was freed. But line #49 then reads
>>>>>>>>>> the
>>>>>>>>>> raw id of that thread, which is then a read-after-free - which is
>>>>>>>>>> a "Bad
>>>>>>>>>> Thing (TM)".
>>>>>>>>>>
>>>>>>>>>> There's no simple fix for the caching code - you would need a
>>>>>>>>>> completely
>>>>>>>>>> different approach (or synchronization that would nullify the
>>>>>>>>>> whole
>>>>>>>>>> point of the cache).
>>>>>>>>>>
>>>>>>>>>> Now all this ThreadLocalStorage code is pretty old and was put in
>>>>>>>>>> place
>>>>>>>>>> to deal with inadequacies of the system provided
>>>>>>>>>> thread-specific-storage
>>>>>>>>>> API. In fact on Solaris we even by-pass the public API
>>>>>>>>>> (thr_getspecific/thr_setspecific) when we can and implement our
>>>>>>>>>> own
>>>>>>>>>> version using lower-level APIs available in the T1/T2 threading
>>>>>>>>>> libraries!
>>>>>>>>>>
>>>>>>>>>> In mid-2015 things have changed considerably and we have reliable
>>>>>>>>>> and
>>>>>>>>>> performant support for thread-local variables at the C+
>>>>>>>>>> language-level.
>>>>>>>>>> So the way to maintain the current thread is simply using:
>>>>>>>>>>
>>>>>>>>>>   // Declaration of thread-local variable
>>>>>>>>>>   static __thread Thread * _thr_current
>>>>>>>>>>
>>>>>>>>>>   inline Thread* ThreadLocalStorage::thread()  {
>>>>>>>>>>     return _thr_current;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>   inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>>>>>>>>     _thr_current = thread;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> And all the complex ThreadLocalStorage code with caching etc all
>>>>>>>>>> vanishes!
>>>>>>>>>>
>>>>>>>>>> For my next trick I plan to try and remove the ThreadLocalStorage
>>>>>>>>>> class
>>>>>>>>>> completely by using language-based thread-locals on all
>>>>>>>>>> platforms. But
>>>>>>>>>> for now this is just Solaris and so we still need the
>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>> API. However a lot of that API is not needed any more on Solaris
>>>>>>>>>> so I
>>>>>>>>>> have excluded it from there in the shared code (ifndef SOLARIS).
>>>>>>>>>> But to
>>>>>>>>>> avoid changing other shared-code callsites of ThreadLocalStorage
>>>>>>>>>> I've
>>>>>>>>>> kept part of the API with trivial implementations on Solaris.
>>>>>>>>>>
>>>>>>>>>> Testing: JPRT
>>>>>>>>>>           All hotspot regression tests
>>>>>>>>>>
>>>>>>>>>> I'm happy to run more tests but the nice thing about such
>>>>>>>>>> low-level code
>>>>>>>>>> is that if it is broken, it is always broken :) Every use of
>>>>>>>>>> Thread::current or MacroAssembler::get_thread now hits this code.
>>>>>>>>>>
>>>>>>>>>> Performance: I've run a basic set of benchmarks that is readily
>>>>>>>>>> available to me on our performance testing system. The best
>>>>>>>>>> way to
>>>>>>>>>> describe the result is neutral. There are some slight wins, and
>>>>>>>>>> some
>>>>>>>>>> slight losses, with most showing no statistical difference. And
>>>>>>>>>> even the
>>>>>>>>>> "wins" and "losses" are within the natural variations of the
>>>>>>>>>> benchmarks.
>>>>>>>>>> So a lot of complex code has been replaced by simple code and we
>>>>>>>>>> haven't
>>>>>>>>>> lost any observable performance - which seems like a win to me.
>>>>>>>>>>
>>>>>>>>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a
>>>>>>>>>> little
>>>>>>>>>> surprising but very nice.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>
>>>>>


More information about the hotspot-dev mailing list