RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
David Holmes
david.holmes at oracle.com
Mon Nov 23 12:38:13 UTC 2015
On 23/11/2015 10:27 PM, David Holmes wrote:
> Hi Bertrand,
>
> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>> Hi David,
>>
>> Overall looks good.
>> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>
>> One doubt, in case this has not been discussed before.
>>
>> I'm still catching up on x86_64 (after years of ARM assembly :-)) but it
>> seems that there are some stack alignment constraints on runtime calls,
>> at least for some x86_64 ABIs.
>>
>> Some of the x86 MacroAssembler::get_thread implementations had code to
>> align the stack before calling pthread_getspecific. See for instance
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>
>
> Sorry I'm not that familiar with the MacroAssembler - is it this odd
> fragment:
>
> - push(r10);
> - // XXX
> - mov(r10, rsp);
> - andq(rsp, -16);
> - push(r10);
>
> I'm not at all clear what that is doing - and if it somehow changes the
> alignment wouldn't something need to be fixed up when popping the
> previous values ??
>
> To be honest I'm not even sure what an "unaligned stack" means.
Maybe I know more than I thought - or not :). The original code calls
pthread_getspecific directly and has to pass the key/index as a
parameter - that has to be properly aligned. The new code does not pass
any parameters, but just calls Thread::current.
David
-----
>>
>> This alignment is no longer performed in the new (shared) implementation
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>
>>
>>
>> Now, Solaris was not performing the alignment and Windows has a separate
>> path for x86_64. Did we really need the alignment for linux x86_64 and
>> bsd_x86_64 ? Might it be needed for other ports ?
>>
>> IMHO, it might be safer to align the stack by default, knowing it should
>> not be expensive since we call get_thread rarely for x86_64 (result is
>> cached in r15). I'll let you see whether it is worth adding an ifdef so
>> as to explicitly deactivate the alignment on some platforms
>> (solaris_x86_64 ?)
>
> I don't have enough knowledge to even begin to comment on this. I will
> have to rely on our x86_64 macroassembler experts to explain the old
> code and what the requirements are.
>
> Thanks,
> David
>
>> Regards,
>>
>> Bertrand.
>>
>>
>>
>> On 23/11/2015 08:03, David Holmes wrote:
>>> After all the preliminary discussions here are final proposed changes:
>>>
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>
>>> Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>>
>>> A simple (in principle) but wide-ranging change which should appeal to
>>> our Code Deletion Engineer's. We implement Thread::current() using a
>>> compiler/language-based thread-local variable eg:
>>>
>>> static __thread Thread *_thr_current;
>>>
>>> inline Thread* Thread::current() {
>>> return _thr_current;
>>> }
>>>
>>> with an appropriate setter of course. By doing this we can completely
>>> remove the os_cpu-specific ThreadLocalStorage implementations, and the
>>> associated os::thread_local_storage* calls.
>>>
>>> As __thread is not async-signal-safe (either in theory or practice) we
>>> need a fallback library-based solution as used today. For this we use a
>>> very simple ThreadLocalStorage class and an implementation thereof for
>>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>> pthread_get/setspecific; and one for Windows using its TLS library.
>>> While these library routines are not guaranteed async-signal-safe, they
>>> seem to be in practice and are what we have been using all along.
>>>
>>> We also allow for use of only the library-based TLS for platforms where
>>> compiler-based thread locals are not supported (as will be needed in the
>>> Mobile project). This is enabled at build time by defining
>>> USE_LIBRARY_BASED_TLS_ONLY.
>>>
>>> Thanks to Andrew Haley for providing the Aarch64 code; and for Thomas
>>> Stuefe for testing PPC and AIX.
>>>
>>> Testing:
>>> - JPRT (core platforms)
>>> - Jtreg tests (linux & solaris)
>>> - vm.runtime (core platforms)
>>>
>>> Performance:
>>> - still TBD - this is proving to be extremely difficult. If anyone has
>>> any simple to run microbenchmarks to suggest I'd give them a try as a
>>> sanity check. But I lack access to hardware for running serious
>>> benchmarking.
>>>
>>> Footprint:
>>> - varies by platform and the VM (server, client, minimal)
>>> - worst-case: ~1% increase for server VM and minimal VM
>>> - best-case: 0.4% decrease for client VM
>>>
>>> Thanks,
>>> David
>>
>>
More information about the hotspot-dev
mailing list