RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Bertrand Delsart bertrand.delsart at oracle.com
Tue Nov 24 10:58:14 UTC 2015


On 24/11/2015 03:19, David Holmes wrote:
> Sorry about that:
>
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v7/

Approved.

Regards,

Bertrand (not a Reviewer).

>
> David
>
> On 24/11/2015 11:17 AM, David Holmes wrote:
>> Updated webrev:
>>
>> cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>
>> Only change is in macroAssembler_x86.cpp
>>
>> Thanks,
>> David
>>
>> On 24/11/2015 7:42 AM, David Holmes wrote:
>>> On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>>> On 23/11/2015 13:27, David Holmes wrote:
>>>>> Hi Bertrand,
>>>>>
>>>>> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Overall looks good.
>>>>>> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>>>>>
>>>>>> One doubt, in case this has not been discussed before.
>>>>>>
>>>>>> I'm still catching up on x86_64 (after years of ARM assembly :-))
>>>>>> but it
>>>>>> seems that there are some stack alignment constraints on runtime
>>>>>> calls,
>>>>>> at least for some x86_64 ABIs.
>>>>>>
>>>>>> Some of the x86 MacroAssembler::get_thread implementations had
>>>>>> code to
>>>>>> align the stack before calling pthread_getspecific. See for instance
>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Sorry I'm not that familiar with the MacroAssembler - is it this odd
>>>>> fragment:
>>>>>
>>>>> -   push(r10);
>>>>> -   // XXX
>>>>> -   mov(r10, rsp);
>>>>> -   andq(rsp, -16);
>>>>> -   push(r10);
>>>>>
>>>>> I'm not at all clear what that is doing - and if it somehow changes
>>>>> the
>>>>> alignment wouldn't something need to be fixed up when popping the
>>>>> previous values ??
>>>>>
>>>>> To be honest I'm not even sure what an "unaligned stack" means.
>>>>
>>>> On some platforms, SP may have to be aligned on a 16 bytes boundary
>>>> when
>>>> calling another method.
>>>
>>> Okay - that seems to be an x64 ABI requirement based on other code in
>>> MacroAssembler. The reason it is missing in the Solaris code is because
>>> I already removed it in the earlier changes that were done on Solaris:
>>>
>>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>>
>>> and nobody picked it up. :(
>>>
>>> That said, the absence of this code has not caused any problems so
>>> presumably we are normally aligned.
>>>
>>>> After having pushed what needed to be saved, the code above rounds SP
>>>> down and saves the old value. It will then also push r11, which results
>>>> in the expected alignment.
>>>>
>>>> The conterpart, after the VM call, is the:
>>>>   pop(r11);
>>>>   pop(rsp);
>>>
>>> I had assumed this extra manipulation was related to pushing the arg for
>>> the call.
>>>
>>>>>> This alignment is no longer performed in the new (shared)
>>>>>> implementation
>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Now, Solaris was not performing the alignment and Windows has a
>>>>>> separate
>>>>>> path for x86_64. Did we really need the alignment for linux x86_64
>>>>>> and
>>>>>> bsd_x86_64 ? Might it be needed for other ports ?
>>>>>>
>>>>>> IMHO, it might be safer to align the stack by default, knowing it
>>>>>> should
>>>>>> not be expensive since we call get_thread rarely for x86_64
>>>>>> (result is
>>>>>> cached in r15). I'll let you see whether it is worth adding an
>>>>>> ifdef so
>>>>>> as to explicitly deactivate the alignment on some platforms
>>>>>> (solaris_x86_64 ?)
>>>>>
>>>>> I don't have enough knowledge to even begin to comment on this. I will
>>>>> have to rely on our x86_64 macroassembler experts to explain the old
>>>>> code and what the requirements are.
>>>>
>>>> OK. This means that the alignment was not removed purposefully.
>>>>
>>>> In that case, you must either keep the per platform code x86_64 (since
>>>> it was not identical for all platforms) or use the safest version, with
>>>> the additional
>>>>
>>>>     // XXX
>>>>     mov(r10, rsp);
>>>>     andq(rsp, -16);
>>>>     push(r10);
>>>>
>>>> before the push(r11) and with the
>>>>
>>>>     pop(rsp);
>>>>
>>>> after the pop(r11). It should work on all x86_64 platforms.
>>>
>>> Right, I will add this back to the code - and replace the meaningless //
>>> XXX with an actual comment!
>>>
>>> Many thanks,
>>> David
>>> -----
>>>
>>>> FYI, there is another way to do the alignment, based on the fact that
>>>> we are at least aligned on 8 bytes (see
>>>> MacroAssembler::call_VM_leaf_base). I assume that this second
>>>> version is
>>>> more efficient (particularly thanks to specilative branches) but it
>>>> might be safer/simpler to continue using andq in get_thread.
>>>>
>>>> Bertrand.
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Bertrand.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23/11/2015 08:03, David Holmes wrote:
>>>>>>> After all the preliminary discussions here are final proposed
>>>>>>> changes:
>>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>
>>>>>>> Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>>>>>>
>>>>>>> A simple (in principle) but wide-ranging change which should
>>>>>>> appeal to
>>>>>>> our Code Deletion Engineer's. We implement Thread::current() using a
>>>>>>> compiler/language-based thread-local variable eg:
>>>>>>>
>>>>>>>   static __thread Thread *_thr_current;
>>>>>>>
>>>>>>>   inline Thread* Thread::current() {
>>>>>>>     return _thr_current;
>>>>>>>   }
>>>>>>>
>>>>>>> with an appropriate setter of course. By doing this we can
>>>>>>> completely
>>>>>>> remove the os_cpu-specific ThreadLocalStorage implementations, and
>>>>>>> the
>>>>>>> associated os::thread_local_storage* calls.
>>>>>>>
>>>>>>> As __thread is not async-signal-safe (either in theory or
>>>>>>> practice) we
>>>>>>> need a fallback library-based solution as used today. For this we
>>>>>>> use a
>>>>>>> very simple ThreadLocalStorage class and an implementation thereof
>>>>>>> for
>>>>>>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>>>>>> pthread_get/setspecific; and one for Windows using its TLS library.
>>>>>>> While these library routines are not guaranteed async-signal-safe,
>>>>>>> they
>>>>>>> seem to be in practice and are what we have been using all along.
>>>>>>>
>>>>>>> We also allow for use of only the library-based TLS for platforms
>>>>>>> where
>>>>>>> compiler-based thread locals are not supported (as will be needed in
>>>>>>> the
>>>>>>> Mobile project). This is enabled at build time by defining
>>>>>>> USE_LIBRARY_BASED_TLS_ONLY.
>>>>>>>
>>>>>>> Thanks to Andrew Haley for providing the Aarch64 code; and for
>>>>>>> Thomas
>>>>>>> Stuefe for testing PPC and AIX.
>>>>>>>
>>>>>>> Testing:
>>>>>>>   - JPRT (core platforms)
>>>>>>>   - Jtreg tests (linux & solaris)
>>>>>>>   - vm.runtime (core platforms)
>>>>>>>
>>>>>>> Performance:
>>>>>>>   - still TBD - this is proving to be extremely difficult. If anyone
>>>>>>> has
>>>>>>> any simple to run microbenchmarks to suggest I'd give them a try
>>>>>>> as a
>>>>>>> sanity check. But I lack access to hardware for running serious
>>>>>>> benchmarking.
>>>>>>>
>>>>>>> Footprint:
>>>>>>> - varies by platform and the VM (server, client, minimal)
>>>>>>> - worst-case: ~1% increase for server VM and minimal VM
>>>>>>> - best-case:  0.4% decrease for client VM
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>
>>>>>>
>>>>
>>>>


-- 
Bertrand Delsart,                     Grenoble Engineering Center
Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
38330 Montbonnot Saint Martin,                             FRANCE
bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of
the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


More information about the hotspot-dev mailing list