RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

David Holmes david.holmes at oracle.com
Wed Nov 25 22:52:16 UTC 2015


On 26/11/2015 2:53 AM, Thomas Stüfe wrote:
> I created a bug to track this:
>
> https://bugs.openjdk.java.net/browse/JDK-8144059

Thanks. I admit I still don't really grok the exact problem here. It is 
the alignment code that is the problem right?

79 mov(r10, rsp);
80 andq(rsp, -16);
81 push(r10);

?

Thanks,
David

> Because Davids change is not submitted yet, the bug refers to the old
> coding, but stays relevant also after this fix.
>
> Kind Regards, Thomas
>
> On Wed, Nov 25, 2015 at 3:52 PM, Daniel D. Daugherty
> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>
>     On 11/25/15 6:10 AM, David Holmes wrote:
>
>         On 25/11/2015 8:46 PM, Thomas Stüfe wrote:
>
>
>             On Wed, Nov 25, 2015 at 10:22 AM, Bertrand Delsart
>             <bertrand.delsart at oracle.com
>             <mailto:bertrand.delsart at oracle.com>
>             <mailto:bertrand.delsart at oracle.com
>             <mailto:bertrand.delsart at oracle.com>>> wrote:
>
>                  Good finding Thomas,
>
>                  On 25/11/2015 09:40, Thomas Stüfe wrote:
>
>                      Hi David, Bertrand,
>
>                      about
>             http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/src/cpu/x86/vm/macroAssembler_x86.cpp.frames.html
>
>                      I am not sure this is correct for Windows x64.
>
>
>                  Unless I'm misreading it, the new code looks a lot like
>             the old one
>                  (including the alignement part). Hence this does not
>             look like a
>                  regression caused by David's change.
>
>
>             Yes, this seems like an old bug.
>
>             I also see that this is fixed in our codebase since 2010,
>             exactly the
>             way you propose (with MacroAssembler
>             <http://ld8443:8080/source/s?defs=MacroAssembler&project=dev-hotspot>::call_VM_leaf_base
>
>             <http://ld8443:8080/source/s?defs=call_VM_leaf_base&project=dev-hotspot>).
>
>             So, it is already used since five years. I would propose
>             opening a
>             separate bug for this one and we contribute our fix.
>
>
>         That sounds good to me too. I also suspect additional code will
>         need to be checked as this stack alignment is performed in a few
>         places in ./cpu/x86/vm/macroAssembler_x86.cpp with no OS
>         specific differences being considered.
>
>
>     Looks like MacroAssembler::call_VM_leaf_base() does the "right thing"
>     for stack alignment in the X64 case. Perhaps we need to audit both
>     stack alignment _and_ uses of MacroAssembler::call() just to make
>     sure...
>
>     Did I mention that all the assembly code stuff gives me a headache? :-)
>
>     Dan
>
>
>
>
>         Thanks,
>         David
>
>             Regards, Thomas
>
>
>             https://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.90).aspx
>
>                      windows x64 Abi requires the register parameter
>             area right above the
>                      return address on the stack. It is not guaranteed to be
>                      preserved during
>                      the function call:
>
>                      "Even if the called function has fewer than 4
>             parameters, these
>                      4 stack
>                      locations are effectively owned by the called
>             function, and may
>                      be used
>                      by the called function for other purposes besides
>             saving parameter
>                      register values. Thus the caller may not save
>             information in
>                      this region
>                      of stack across a function call."
>
>                      So, restoring parameters from it may not work if
>                      Thread::current() uses
>                      this area for any reason. In this case, r9, rsp,
>             r10, r11 may be
>                      overwritten by the callee.
>
>                      I also think Windows x64 ABI is different enough
>             from Linux that
>                      this
>                      function could live somewhere in <os>_<cpu>.
>
>
>                  This is a valid point. However, note that this is
>             similar to the
>                  issue we have for instance in 'shared'
>                  MacroAssembler::call_VM_leaf_base, which was not put in
>             <os>_<cpu>
>                  files.
>
>                  In fact, the right fix may be to use call_VM_leaf
>             instead of call :-)
>
>                  (alternatively we could use "#ifdef _WIN64" or check
>                  "frame::arg_reg_save_area_bytes != 0", which may be
>             even cleaner and
>                  more portable)
>
>                  All these solutions are fine for me.
>
>
>                  Since we are near to feature freeze, this should
>             probably be done in
>                  a separate CR (unless David has enough time to test it).
>
>                  Regards,
>
>                  Bertrand.
>
>
>                      Kind Regards, Thomas
>
>
>
>
>                      On Mon, Nov 23, 2015 at 10:42 PM, David Holmes
>                      <david.holmes at oracle.com
>             <mailto:david.holmes at oracle.com>
>             <mailto:david.holmes at oracle.com
>             <mailto:david.holmes at oracle.com>>
>                      <mailto:david.holmes at oracle.com
>             <mailto:david.holmes at oracle.com>
>                      <mailto:david.holmes at oracle.com
>             <mailto:david.holmes at oracle.com>>>> wrote:
>
>                           On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>
>                               On 23/11/2015 13:27, David Holmes wrote:
>
>                                   Hi Bertrand,
>
>                                   On 23/11/2015 8:14 PM, Bertrand
>             Delsart wrote:
>
>                                       Hi David,
>
>                                       Overall looks good.
>                                       Thanks for the #ifndef
>                      USE_LIBRARY_BASED_TLS_ONLY :-)
>
>                                       One doubt, in case this has not
>             been discussed
>                      before.
>
>                                       I'm still catching up on x86_64
>             (after years of ARM
>                                       assembly :-)) but it
>                                       seems that there are some stack
>             alignment
>                      constraints on
>                                       runtime calls,
>                                       at least for some x86_64 ABIs.
>
>                                       Some of the x86
>             MacroAssembler::get_thread
>                                       implementations had code to
>                                       align the stack before calling
>                      pthread_getspecific. See
>                                       for instance
>             http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>
>
>
>                                   Sorry I'm not that familiar with the
>             MacroAssembler
>                      - is it
>                                   this odd
>                                   fragment:
>
>                                   -   push(r10);
>                                   -   // XXX
>                                   -   mov(r10, rsp);
>                                   -   andq(rsp, -16);
>                                   -   push(r10);
>
>                                   I'm not at all clear what that is
>             doing - and if it
>                      somehow
>                                   changes the
>                                   alignment wouldn't something need to
>             be fixed up when
>                                   popping the
>                                   previous values ??
>
>                                   To be honest I'm not even sure what an
>             "unaligned
>                      stack" means.
>
>
>                               On some platforms, SP may have to be
>             aligned on a 16 bytes
>                               boundary when
>                               calling another method.
>
>
>                           Okay - that seems to be an x64 ABI requirement
>             based on
>                      other code
>                           in MacroAssembler. The reason it is missing in
>             the Solaris
>                      code is
>                           because I already removed it in the earlier
>             changes that
>                      were done
>                           on Solaris:
>
>             http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>
>                           and nobody picked it up. :(
>
>                           That said, the absence of this code has not
>             caused any
>                      problems so
>                           presumably we are normally aligned.
>
>                               After having pushed what needed to be
>             saved, the code above
>                               rounds SP
>                               down and saves the old value. It will then
>             also push
>                      r11, which
>                               results
>                               in the expected alignment.
>
>                               The conterpart, after the VM call, is the:
>                                  pop(r11);
>                                  pop(rsp);
>
>
>                           I had assumed this extra manipulation was
>             related to
>                      pushing the arg
>                           for the call.
>
>                                       This alignment is no longer
>             performed in the new
>                                       (shared) implementation
>             http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>
>
>
>
>                                       Now, Solaris was not performing
>             the alignment and
>                                       Windows has a separate
>                                       path for x86_64. Did we really
>             need the
>                      alignment for
>                                       linux x86_64 and
>                                       bsd_x86_64 ? Might it be needed
>             for other ports ?
>
>                                       IMHO, it might be safer to align
>             the stack by
>                      default,
>                                       knowing it should
>                                       not be expensive since we call
>             get_thread
>                      rarely for
>                                       x86_64 (result is
>                                       cached in r15). I'll let you see
>             whether it is
>                      worth
>                                       adding an ifdef so
>                                       as to explicitly deactivate the
>             alignment on
>                      some platforms
>                                       (solaris_x86_64 ?)
>
>
>                                   I don't have enough knowledge to even
>             begin to
>                      comment on
>                                   this. I will
>                                   have to rely on our x86_64
>             macroassembler experts
>                      to explain
>                                   the old
>                                   code and what the requirements are.
>
>
>                               OK. This means that the alignment was not
>             removed
>                      purposefully.
>
>                               In that case, you must either keep the per
>             platform
>                      code x86_64
>                               (since
>                               it was not identical for all platforms) or
>             use the safest
>                               version, with
>                               the additional
>
>                                    // XXX
>                                    mov(r10, rsp);
>                                    andq(rsp, -16);
>                                    push(r10);
>
>                               before the push(r11) and with the
>
>                                    pop(rsp);
>
>                               after the pop(r11). It should work on all
>             x86_64 platforms.
>
>
>                           Right, I will add this back to the code - and
>             replace the
>                           meaningless // XXX with an actual comment!
>
>                           Many thanks,
>                           David
>                           -----
>
>
>                               FYI, there is another way to do the
>             alignment, based on
>                      the fact
>                               that
>                               we are at least aligned on 8 bytes (see
>                               MacroAssembler::call_VM_leaf_base). I
>             assume that this
>                      second
>                               version is
>                               more efficient (particularly thanks to
>             specilative
>                      branches) but it
>                               might be safer/simpler to continue using
>             andq in
>                      get_thread.
>
>                               Bertrand.
>
>
>                                   Thanks,
>                                   David
>
>                                       Regards,
>
>                                       Bertrand.
>
>
>
>                                       On 23/11/2015 08:03, David Holmes
>             wrote:
>
>                                           After all the preliminary
>             discussions here
>                      are final
>                                           proposed changes:
>
>                                           bug:
>             https://bugs.openjdk.java.net/browse/JDK-8132510
>
>                                           Open webrev:
>             http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>
>                                           A simple (in principle) but
>             wide-ranging change
>                                           which should appeal to
>                                           our Code Deletion Engineer's.
>             We implement
>                                           Thread::current() using a
>                                           compiler/language-based
>             thread-local
>                      variable eg:
>
>                                              static __thread Thread
>             *_thr_current;
>
>                                              inline Thread*
>             Thread::current() {
>                                                return _thr_current;
>                                              }
>
>                                           with an appropriate setter of
>             course. By
>                      doing this
>                                           we can completely
>                                           remove the os_cpu-specific
>             ThreadLocalStorage
>                                           implementations, and the
>                                           associated
>             os::thread_local_storage* calls.
>
>                                           As __thread is not
>             async-signal-safe (either in
>                                           theory or practice) we
>                                           need a fallback library-based
>             solution as used
>                                           today. For this we use a
>                                           very simple ThreadLocalStorage
>             class and an
>                                           implementation thereof for
>                                           POSIX (covers AIX, BSD, Linux,
>             OSX and
>                      Solaris) using
>                                           pthread_get/setspecific; and
>             one for
>                      Windows using
>                                           its TLS library.
>                                           While these library routines
>             are not guaranteed
>                                           async-signal-safe, they
>                                           seem to be in practice and are
>             what we have
>                      been
>                                           using all along.
>
>                                           We also allow for use of only the
>                      library-based TLS
>                                           for platforms where
>                                           compiler-based thread locals
>             are not
>                      supported (as
>                                           will be needed in
>                                           the
>                                           Mobile project). This is
>             enabled at build
>                      time by
>                                           defining
>                                           USE_LIBRARY_BASED_TLS_ONLY.
>
>                                           Thanks to Andrew Haley for
>             providing the
>                      Aarch64
>                                           code; and for Thomas
>                                           Stuefe for testing PPC and AIX.
>
>                                           Testing:
>                                              - JPRT (core platforms)
>                                              - Jtreg tests (linux & solaris)
>                                              - vm.runtime (core platforms)
>
>                                           Performance:
>                                              - still TBD - this is
>             proving to be
>                      extremely
>                                           difficult. If anyone
>                                           has
>                                           any simple to run
>             microbenchmarks to
>                      suggest I'd
>                                           give them a try as a
>                                           sanity check. But I lack
>             access to hardware for
>                                           running serious
>                                           benchmarking.
>
>                                           Footprint:
>                                           - varies by platform and the
>             VM (server,
>                      client,
>                                           minimal)
>                                           - worst-case: ~1% increase for
>             server VM
>                      and minimal VM
>                                           - best-case:  0.4% decrease
>             for client VM
>
>                                           Thanks,
>                                           David
>
>
>
>
>
>
>
>
>                  --
>                  Bertrand Delsart,                     Grenoble
>             Engineering Center
>                  Oracle,         180 av. de l'Europe,          ZIRST de
>             Montbonnot
>                  38330 Montbonnot Saint Martin, FRANCE
>             bertrand.delsart at oracle.com
>             <mailto:bertrand.delsart at oracle.com>
>             <mailto:bertrand.delsart at oracle.com
>             <mailto:bertrand.delsart at oracle.com>>
>                            Phone : +33 4 76 18 81 23
>
>             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>                  NOTICE: This email message is for the sole use of the
>             intended
>                  recipient(s) and may contain confidential and privileged
>                  information. Any unauthorized review, use, disclosure or
>                  distribution is prohibited. If you are not the intended
>             recipient,
>                  please contact the sender by reply email and destroy
>             all copies of
>                  the original message.
>             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>


More information about the hotspot-dev mailing list