RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

David Holmes david.holmes at oracle.com
Wed Nov 25 13:10:33 UTC 2015


On 25/11/2015 8:46 PM, Thomas Stüfe wrote:
>
> On Wed, Nov 25, 2015 at 10:22 AM, Bertrand Delsart
> <bertrand.delsart at oracle.com <mailto:bertrand.delsart at oracle.com>> wrote:
>
>     Good finding Thomas,
>
>     On 25/11/2015 09:40, Thomas Stüfe wrote:
>
>         Hi David, Bertrand,
>
>         about
>         http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/src/cpu/x86/vm/macroAssembler_x86.cpp.frames.html
>
>         I am not sure this is correct for Windows x64.
>
>
>     Unless I'm misreading it, the new code looks a lot like the old one
>     (including the alignement part). Hence this does not look like a
>     regression caused by David's change.
>
>
> Yes, this seems like an old bug.
>
> I also see that this is fixed in our codebase since 2010, exactly the
> way you propose (with MacroAssembler
> <http://ld8443:8080/source/s?defs=MacroAssembler&project=dev-hotspot>::call_VM_leaf_base
> <http://ld8443:8080/source/s?defs=call_VM_leaf_base&project=dev-hotspot>).
> So, it is already used since five years. I would propose opening a
> separate bug for this one and we contribute our fix.

That sounds good to me too. I also suspect additional code will need to 
be checked as this stack alignment is performed in a few places in 
./cpu/x86/vm/macroAssembler_x86.cpp with no OS specific differences 
being considered.

Thanks,
David

> Regards, Thomas
>
>
>         https://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.90).aspx
>
>         windows x64 Abi requires the register parameter area right above the
>         return address on the stack. It is not guaranteed to be
>         preserved during
>         the function call:
>
>         "Even if the called function has fewer than 4 parameters, these
>         4 stack
>         locations are effectively owned by the called function, and may
>         be used
>         by the called function for other purposes besides saving parameter
>         register values. Thus the caller may not save information in
>         this region
>         of stack across a function call."
>
>         So, restoring parameters from it may not work if
>         Thread::current() uses
>         this area for any reason. In this case, r9, rsp, r10, r11 may be
>         overwritten by the callee.
>
>         I also think Windows x64 ABI is different enough from Linux that
>         this
>         function could live somewhere in <os>_<cpu>.
>
>
>     This is a valid point. However, note that this is similar to the
>     issue we have for instance in 'shared'
>     MacroAssembler::call_VM_leaf_base, which was not put in <os>_<cpu>
>     files.
>
>     In fact, the right fix may be to use call_VM_leaf instead of call :-)
>
>     (alternatively we could use "#ifdef _WIN64" or check
>     "frame::arg_reg_save_area_bytes != 0", which may be even cleaner and
>     more portable)
>
>     All these solutions are fine for me.
>
>
>     Since we are near to feature freeze, this should probably be done in
>     a separate CR (unless David has enough time to test it).
>
>     Regards,
>
>     Bertrand.
>
>
>         Kind Regards, Thomas
>
>
>
>
>         On Mon, Nov 23, 2015 at 10:42 PM, David Holmes
>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>         <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>> wrote:
>
>              On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>
>                  On 23/11/2015 13:27, David Holmes wrote:
>
>                      Hi Bertrand,
>
>                      On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>
>                          Hi David,
>
>                          Overall looks good.
>                          Thanks for the #ifndef
>         USE_LIBRARY_BASED_TLS_ONLY :-)
>
>                          One doubt, in case this has not been discussed
>         before.
>
>                          I'm still catching up on x86_64 (after years of ARM
>                          assembly :-)) but it
>                          seems that there are some stack alignment
>         constraints on
>                          runtime calls,
>                          at least for some x86_64 ABIs.
>
>                          Some of the x86 MacroAssembler::get_thread
>                          implementations had code to
>                          align the stack before calling
>         pthread_getspecific. See
>                          for instance
>         http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>
>
>
>                      Sorry I'm not that familiar with the MacroAssembler
>         - is it
>                      this odd
>                      fragment:
>
>                      -   push(r10);
>                      -   // XXX
>                      -   mov(r10, rsp);
>                      -   andq(rsp, -16);
>                      -   push(r10);
>
>                      I'm not at all clear what that is doing - and if it
>         somehow
>                      changes the
>                      alignment wouldn't something need to be fixed up when
>                      popping the
>                      previous values ??
>
>                      To be honest I'm not even sure what an "unaligned
>         stack" means.
>
>
>                  On some platforms, SP may have to be aligned on a 16 bytes
>                  boundary when
>                  calling another method.
>
>
>              Okay - that seems to be an x64 ABI requirement based on
>         other code
>              in MacroAssembler. The reason it is missing in the Solaris
>         code is
>              because I already removed it in the earlier changes that
>         were done
>              on Solaris:
>
>         http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>
>              and nobody picked it up. :(
>
>              That said, the absence of this code has not caused any
>         problems so
>              presumably we are normally aligned.
>
>                  After having pushed what needed to be saved, the code above
>                  rounds SP
>                  down and saves the old value. It will then also push
>         r11, which
>                  results
>                  in the expected alignment.
>
>                  The conterpart, after the VM call, is the:
>                     pop(r11);
>                     pop(rsp);
>
>
>              I had assumed this extra manipulation was related to
>         pushing the arg
>              for the call.
>
>                          This alignment is no longer performed in the new
>                          (shared) implementation
>         http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>
>
>
>
>                          Now, Solaris was not performing the alignment and
>                          Windows has a separate
>                          path for x86_64. Did we really need the
>         alignment for
>                          linux x86_64 and
>                          bsd_x86_64 ? Might it be needed for other ports ?
>
>                          IMHO, it might be safer to align the stack by
>         default,
>                          knowing it should
>                          not be expensive since we call get_thread
>         rarely for
>                          x86_64 (result is
>                          cached in r15). I'll let you see whether it is
>         worth
>                          adding an ifdef so
>                          as to explicitly deactivate the alignment on
>         some platforms
>                          (solaris_x86_64 ?)
>
>
>                      I don't have enough knowledge to even begin to
>         comment on
>                      this. I will
>                      have to rely on our x86_64 macroassembler experts
>         to explain
>                      the old
>                      code and what the requirements are.
>
>
>                  OK. This means that the alignment was not removed
>         purposefully.
>
>                  In that case, you must either keep the per platform
>         code x86_64
>                  (since
>                  it was not identical for all platforms) or use the safest
>                  version, with
>                  the additional
>
>                       // XXX
>                       mov(r10, rsp);
>                       andq(rsp, -16);
>                       push(r10);
>
>                  before the push(r11) and with the
>
>                       pop(rsp);
>
>                  after the pop(r11). It should work on all x86_64 platforms.
>
>
>              Right, I will add this back to the code - and replace the
>              meaningless // XXX with an actual comment!
>
>              Many thanks,
>              David
>              -----
>
>
>                  FYI, there is another way to do the alignment, based on
>         the fact
>                  that
>                  we are at least aligned on 8 bytes (see
>                  MacroAssembler::call_VM_leaf_base). I assume that this
>         second
>                  version is
>                  more efficient (particularly thanks to specilative
>         branches) but it
>                  might be safer/simpler to continue using andq in
>         get_thread.
>
>                  Bertrand.
>
>
>                      Thanks,
>                      David
>
>                          Regards,
>
>                          Bertrand.
>
>
>
>                          On 23/11/2015 08:03, David Holmes wrote:
>
>                              After all the preliminary discussions here
>         are final
>                              proposed changes:
>
>                              bug:
>         https://bugs.openjdk.java.net/browse/JDK-8132510
>
>                              Open webrev:
>         http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>
>                              A simple (in principle) but wide-ranging change
>                              which should appeal to
>                              our Code Deletion Engineer's. We implement
>                              Thread::current() using a
>                              compiler/language-based thread-local
>         variable eg:
>
>                                 static __thread Thread *_thr_current;
>
>                                 inline Thread* Thread::current() {
>                                   return _thr_current;
>                                 }
>
>                              with an appropriate setter of course. By
>         doing this
>                              we can completely
>                              remove the os_cpu-specific ThreadLocalStorage
>                              implementations, and the
>                              associated os::thread_local_storage* calls.
>
>                              As __thread is not async-signal-safe (either in
>                              theory or practice) we
>                              need a fallback library-based solution as used
>                              today. For this we use a
>                              very simple ThreadLocalStorage class and an
>                              implementation thereof for
>                              POSIX (covers AIX, BSD, Linux, OSX and
>         Solaris) using
>                              pthread_get/setspecific; and one for
>         Windows using
>                              its TLS library.
>                              While these library routines are not guaranteed
>                              async-signal-safe, they
>                              seem to be in practice and are what we have
>         been
>                              using all along.
>
>                              We also allow for use of only the
>         library-based TLS
>                              for platforms where
>                              compiler-based thread locals are not
>         supported (as
>                              will be needed in
>                              the
>                              Mobile project). This is enabled at build
>         time by
>                              defining
>                              USE_LIBRARY_BASED_TLS_ONLY.
>
>                              Thanks to Andrew Haley for providing the
>         Aarch64
>                              code; and for Thomas
>                              Stuefe for testing PPC and AIX.
>
>                              Testing:
>                                 - JPRT (core platforms)
>                                 - Jtreg tests (linux & solaris)
>                                 - vm.runtime (core platforms)
>
>                              Performance:
>                                 - still TBD - this is proving to be
>         extremely
>                              difficult. If anyone
>                              has
>                              any simple to run microbenchmarks to
>         suggest I'd
>                              give them a try as a
>                              sanity check. But I lack access to hardware for
>                              running serious
>                              benchmarking.
>
>                              Footprint:
>                              - varies by platform and the VM (server,
>         client,
>                              minimal)
>                              - worst-case: ~1% increase for server VM
>         and minimal VM
>                              - best-case:  0.4% decrease for client VM
>
>                              Thanks,
>                              David
>
>
>
>
>
>
>
>
>     --
>     Bertrand Delsart,                     Grenoble Engineering Center
>     Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
>     38330 Montbonnot Saint Martin,                             FRANCE
>     bertrand.delsart at oracle.com <mailto:bertrand.delsart at oracle.com>
>               Phone : +33 4 76 18 81 23
>
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     NOTICE: This email message is for the sole use of the intended
>     recipient(s) and may contain confidential and privileged
>     information. Any unauthorized review, use, disclosure or
>     distribution is prohibited. If you are not the intended recipient,
>     please contact the sender by reply email and destroy all copies of
>     the original message.
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>


More information about the hotspot-dev mailing list