RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Thomas Stüfe thomas.stuefe at gmail.com
Wed Nov 25 10:46:24 UTC 2015


On Wed, Nov 25, 2015 at 10:22 AM, Bertrand Delsart <
bertrand.delsart at oracle.com> wrote:

> Good finding Thomas,
>
> On 25/11/2015 09:40, Thomas Stüfe wrote:
>
>> Hi David, Bertrand,
>>
>> about
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/src/cpu/x86/vm/macroAssembler_x86.cpp.frames.html
>>
>> I am not sure this is correct for Windows x64.
>>
>
> Unless I'm misreading it, the new code looks a lot like the old one
> (including the alignement part). Hence this does not look like a regression
> caused by David's change.
>
>
Yes, this seems like an old bug.

I also see that this is fixed in our codebase since 2010, exactly the way
you propose (with MacroAssembler
<http://ld8443:8080/source/s?defs=MacroAssembler&project=dev-hotspot>::
call_VM_leaf_base
<http://ld8443:8080/source/s?defs=call_VM_leaf_base&project=dev-hotspot>).
So, it is already used since five years. I would propose opening a separate
bug for this one and we contribute our fix.

Regards, Thomas


>
>> https://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.90).aspx
>>
>> windows x64 Abi requires the register parameter area right above the
>> return address on the stack. It is not guaranteed to be preserved during
>> the function call:
>>
>> "Even if the called function has fewer than 4 parameters, these 4 stack
>> locations are effectively owned by the called function, and may be used
>> by the called function for other purposes besides saving parameter
>> register values. Thus the caller may not save information in this region
>> of stack across a function call."
>>
>> So, restoring parameters from it may not work if Thread::current() uses
>> this area for any reason. In this case, r9, rsp, r10, r11 may be
>> overwritten by the callee.
>>
>> I also think Windows x64 ABI is different enough from Linux that this
>> function could live somewhere in <os>_<cpu>.
>>
>
> This is a valid point. However, note that this is similar to the issue we
> have for instance in 'shared' MacroAssembler::call_VM_leaf_base, which was
> not put in <os>_<cpu> files.
>
> In fact, the right fix may be to use call_VM_leaf instead of call :-)
>
> (alternatively we could use "#ifdef _WIN64" or check
> "frame::arg_reg_save_area_bytes != 0", which may be even cleaner and more
> portable)
>
> All these solutions are fine for me.
>
>

> Since we are near to feature freeze, this should probably be done in a
> separate CR (unless David has enough time to test it).
>
> Regards,
>
> Bertrand.
>
>
>> Kind Regards, Thomas
>>
>>
>>
>>
>> On Mon, Nov 23, 2015 at 10:42 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>
>>         On 23/11/2015 13:27, David Holmes wrote:
>>
>>             Hi Bertrand,
>>
>>             On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>
>>                 Hi David,
>>
>>                 Overall looks good.
>>                 Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>
>>                 One doubt, in case this has not been discussed before.
>>
>>                 I'm still catching up on x86_64 (after years of ARM
>>                 assembly :-)) but it
>>                 seems that there are some stack alignment constraints on
>>                 runtime calls,
>>                 at least for some x86_64 ABIs.
>>
>>                 Some of the x86 MacroAssembler::get_thread
>>                 implementations had code to
>>                 align the stack before calling pthread_getspecific. See
>>                 for instance
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>
>>
>>
>>             Sorry I'm not that familiar with the MacroAssembler - is it
>>             this odd
>>             fragment:
>>
>>             -   push(r10);
>>             -   // XXX
>>             -   mov(r10, rsp);
>>             -   andq(rsp, -16);
>>             -   push(r10);
>>
>>             I'm not at all clear what that is doing - and if it somehow
>>             changes the
>>             alignment wouldn't something need to be fixed up when
>>             popping the
>>             previous values ??
>>
>>             To be honest I'm not even sure what an "unaligned stack"
>> means.
>>
>>
>>         On some platforms, SP may have to be aligned on a 16 bytes
>>         boundary when
>>         calling another method.
>>
>>
>>     Okay - that seems to be an x64 ABI requirement based on other code
>>     in MacroAssembler. The reason it is missing in the Solaris code is
>>     because I already removed it in the earlier changes that were done
>>     on Solaris:
>>
>>     http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>
>>     and nobody picked it up. :(
>>
>>     That said, the absence of this code has not caused any problems so
>>     presumably we are normally aligned.
>>
>>         After having pushed what needed to be saved, the code above
>>         rounds SP
>>         down and saves the old value. It will then also push r11, which
>>         results
>>         in the expected alignment.
>>
>>         The conterpart, after the VM call, is the:
>>            pop(r11);
>>            pop(rsp);
>>
>>
>>     I had assumed this extra manipulation was related to pushing the arg
>>     for the call.
>>
>>                 This alignment is no longer performed in the new
>>                 (shared) implementation
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>
>>
>>
>>
>>                 Now, Solaris was not performing the alignment and
>>                 Windows has a separate
>>                 path for x86_64. Did we really need the alignment for
>>                 linux x86_64 and
>>                 bsd_x86_64 ? Might it be needed for other ports ?
>>
>>                 IMHO, it might be safer to align the stack by default,
>>                 knowing it should
>>                 not be expensive since we call get_thread rarely for
>>                 x86_64 (result is
>>                 cached in r15). I'll let you see whether it is worth
>>                 adding an ifdef so
>>                 as to explicitly deactivate the alignment on some
>> platforms
>>                 (solaris_x86_64 ?)
>>
>>
>>             I don't have enough knowledge to even begin to comment on
>>             this. I will
>>             have to rely on our x86_64 macroassembler experts to explain
>>             the old
>>             code and what the requirements are.
>>
>>
>>         OK. This means that the alignment was not removed purposefully.
>>
>>         In that case, you must either keep the per platform code x86_64
>>         (since
>>         it was not identical for all platforms) or use the safest
>>         version, with
>>         the additional
>>
>>              // XXX
>>              mov(r10, rsp);
>>              andq(rsp, -16);
>>              push(r10);
>>
>>         before the push(r11) and with the
>>
>>              pop(rsp);
>>
>>         after the pop(r11). It should work on all x86_64 platforms.
>>
>>
>>     Right, I will add this back to the code - and replace the
>>     meaningless // XXX with an actual comment!
>>
>>     Many thanks,
>>     David
>>     -----
>>
>>
>>         FYI, there is another way to do the alignment, based on the fact
>>         that
>>         we are at least aligned on 8 bytes (see
>>         MacroAssembler::call_VM_leaf_base). I assume that this second
>>         version is
>>         more efficient (particularly thanks to specilative branches) but
>> it
>>         might be safer/simpler to continue using andq in get_thread.
>>
>>         Bertrand.
>>
>>
>>             Thanks,
>>             David
>>
>>                 Regards,
>>
>>                 Bertrand.
>>
>>
>>
>>                 On 23/11/2015 08:03, David Holmes wrote:
>>
>>                     After all the preliminary discussions here are final
>>                     proposed changes:
>>
>>                     bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>
>>                     Open webrev:
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>
>>                     A simple (in principle) but wide-ranging change
>>                     which should appeal to
>>                     our Code Deletion Engineer's. We implement
>>                     Thread::current() using a
>>                     compiler/language-based thread-local variable eg:
>>
>>                        static __thread Thread *_thr_current;
>>
>>                        inline Thread* Thread::current() {
>>                          return _thr_current;
>>                        }
>>
>>                     with an appropriate setter of course. By doing this
>>                     we can completely
>>                     remove the os_cpu-specific ThreadLocalStorage
>>                     implementations, and the
>>                     associated os::thread_local_storage* calls.
>>
>>                     As __thread is not async-signal-safe (either in
>>                     theory or practice) we
>>                     need a fallback library-based solution as used
>>                     today. For this we use a
>>                     very simple ThreadLocalStorage class and an
>>                     implementation thereof for
>>                     POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>                     pthread_get/setspecific; and one for Windows using
>>                     its TLS library.
>>                     While these library routines are not guaranteed
>>                     async-signal-safe, they
>>                     seem to be in practice and are what we have been
>>                     using all along.
>>
>>                     We also allow for use of only the library-based TLS
>>                     for platforms where
>>                     compiler-based thread locals are not supported (as
>>                     will be needed in
>>                     the
>>                     Mobile project). This is enabled at build time by
>>                     defining
>>                     USE_LIBRARY_BASED_TLS_ONLY.
>>
>>                     Thanks to Andrew Haley for providing the Aarch64
>>                     code; and for Thomas
>>                     Stuefe for testing PPC and AIX.
>>
>>                     Testing:
>>                        - JPRT (core platforms)
>>                        - Jtreg tests (linux & solaris)
>>                        - vm.runtime (core platforms)
>>
>>                     Performance:
>>                        - still TBD - this is proving to be extremely
>>                     difficult. If anyone
>>                     has
>>                     any simple to run microbenchmarks to suggest I'd
>>                     give them a try as a
>>                     sanity check. But I lack access to hardware for
>>                     running serious
>>                     benchmarking.
>>
>>                     Footprint:
>>                     - varies by platform and the VM (server, client,
>>                     minimal)
>>                     - worst-case: ~1% increase for server VM and minimal
>> VM
>>                     - best-case:  0.4% decrease for client VM
>>
>>                     Thanks,
>>                     David
>>
>>
>>
>>
>>
>>
>>
>
> --
> Bertrand Delsart,                     Grenoble Engineering Center
> Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
> 38330 Montbonnot Saint Martin,                             FRANCE
> bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or
> distribution is prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of
> the original message.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>


More information about the hotspot-dev mailing list