RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
Bertrand Delsart
bertrand.delsart at oracle.com
Fri Nov 27 08:48:53 UTC 2015
OK for me.
Thanks,
Bertrand.
On 27/11/2015 08:26, David Holmes wrote:
> Hi Thomas,
>
> On 26/11/2015 7:56 PM, Thomas Stüfe wrote:
>>
>> On Thu, Nov 26, 2015 at 12:51 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> On 26/11/2015 9:08 AM, Daniel D. Daugherty wrote:
>>
>> If I understand the problem... :-)
>>
>> The Win-X64 ABI requires that some extra space be allocated on
>> the stack in the caller for "homing" four registers...
>>
>> This code uses MacroAssembler::call() which is a lower level
>> call() function that assumes you know the ABI...
>>
>> 9765 void MacroAssembler::get_thread(Register thread) {
>> <snip>
>> 9784 call(RuntimeAddress(CAST_FROM_FN_PTR(address,
>> Thread::current)));
>>
>> So the failing here was nothing to do with register saving per-se,
>> nor the alignment code, but "simply" not also saving the extra space
>> prior to the call!
>>
>> Yes. These 32 byte are a scratch area for the callee, which may be used
>> for parameter registers or anything else. We may just have been lucky so
>> far that TlsGetValue() did not use this area. As Bertrand already said,
>> now we call Thread::current(), which does magic tls accesses under the
>> hood, so this must be tested again.
>
> Right. All testing so far has been successful but ...
>
>> However I think that a simple addition to your current fix could simply
>> be subtracting 32 from rsp before the call:
>>
>> subq(rsp, 32)
>> call(RuntimeAddress(CAST_FROM_FN_PTR(address, Thread::current)));
>> addq(rsp, 32)
>
> ... I decided to do what was previously suggested and use:
>
> call_VM_leaf_base(CAST_FROM_FN_PTR(address, Thread::current), 0);
>
> as it takes care of everything (stack alignment if needed and preserving
> space for the register args on win64).
>
>> Also, we could probably skip saving/restoring rdi and rsi on Windows:
>
> Maybe - but we don't seem to consider that anywhere else ie:
>
> void MacroAssembler::push_callee_saved_registers() {
> push(rsi);
> push(rdi);
> push(rdx);
> push(rcx);
> }
>
> so I don't want to be the trail-blazer here. :)
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v10
>
> Only change to macroAssembler_x86.cpp. Re-tested on Windows 32-bit and
> 64-bit.
>
> Thanks,
> David
>
>> https://msdn.microsoft.com/en-us/library/6t169e9c.aspx
>>
>> "The registers RAX, RCX, RDX, R8, R9, R10, R11 are considered volatile
>> and must be considered destroyed on function calls (unless otherwise
>> safety-provable by analysis such as whole program optimization). The
>> registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15 are considered
>> nonvolatile and must be saved and restored by a function that uses them."
>
>
>
>> Best Regards, Thomas
>>
>>
>> If you switch to MacroAssembler::call_VM_leaf_base() it handles
>> more of the gory details for you than MacroAssembler::call():
>>
>>
>> Wow! Not exactly a clearly-defined name for this. AFAICS there's
>> nothing VM or leaf specific about this. Also the MacroAssembler is
>> supposed to be a platform-agnostic higher-level abstraction that is
>> easy to use - but in this case every use of the plain call on
>> Windows-x64 is potentially broken if the programmer was not aware of
>> this ABI constraint which only applies to Windows (and we're in
>> OS-neutral files!).
>>
>> This seems fundamentally broken to me.
>>
>> Thanks for the info.
>>
>> David
>>
>>
>> src/cpu/x86/vm/macroAssembler_x86.cpp:
>>
>> 524 void MacroAssembler::call_VM_leaf_base(address
>> entry_point, int
>> num_args
>> ) {
>> 525 Label L, E;
>> 526
>> 527 #ifdef _WIN64
>> 528 // Windows always allocates space for it's
>> register args
>> 529 assert(num_args <= 4, "only register arguments
>> supported");
>> 530 subq(rsp, frame::arg_reg_save_area_bytes);
>> 531 #endif
>> 532
>> 533 // Align stack if necessary
>> 534 testl(rsp, 15);
>> 535 jcc(Assembler::zero, L);
>> 536
>> 537 subq(rsp, 8);
>> 538 {
>> 539 call(RuntimeAddress(entry_point));
>> 540 }
>>
>> So call_VM_leaf_base() allocates the "homing" space for register
>> args for Win-X64 on L530 and it handles any stack alignment
>> before
>> calling MacroAssembler::call().
>>
>> So if you switch to call_VM_leaf_base() you can drop
>> the stack alignment code:
>>
>> 9777 // ensure stack alignment
>> 9778 mov(r10, rsp);
>> 9779 andq(rsp, -16);
>> 9780 push(r10);
>>
>> and just switch to:
>>
>> 9777 push(rsp);
>>
>> And you won't be exposed to potential saved register
>> overwrite by the "homing" of any registers in the call
>> Thread::current().
>>
>> I _think_ I have that all correct... :-)
>>
>> Dan
>>
>>
>> On 11/25/15 3:52 PM, David Holmes wrote:
>>
>> On 26/11/2015 2:53 AM, Thomas Stüfe wrote:
>>
>> I created a bug to track this:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8144059
>>
>>
>> Thanks. I admit I still don't really grok the exact problem
>> here. It
>> is the alignment code that is the problem right?
>>
>> 79 mov(r10, rsp);
>> 80 andq(rsp, -16);
>> 81 push(r10);
>>
>> ?
>>
>> Thanks,
>> David
>>
>> Because Davids change is not submitted yet, the bug
>> refers to the old
>> coding, but stays relevant also after this fix.
>>
>> Kind Regards, Thomas
>>
>> On Wed, Nov 25, 2015 at 3:52 PM, Daniel D. Daugherty
>> <daniel.daugherty at oracle.com
>> <mailto:daniel.daugherty at oracle.com>
>> <mailto:daniel.daugherty at oracle.com
>> <mailto:daniel.daugherty at oracle.com>>>
>> wrote:
>>
>> On 11/25/15 6:10 AM, David Holmes wrote:
>>
>> On 25/11/2015 8:46 PM, Thomas Stüfe wrote:
>>
>>
>> On Wed, Nov 25, 2015 at 10:22 AM, Bertrand
>> Delsart
>> <bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>
>> <mailto:bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>>
>> <mailto:bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>
>> <mailto:bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>>>> wrote:
>>
>> Good finding Thomas,
>>
>> On 25/11/2015 09:40, Thomas Stüfe
>> wrote:
>>
>> Hi David, Bertrand,
>>
>> about
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/src/cpu/x86/vm/macroAssembler_x86.cpp.frames.html
>>
>>
>>
>> I am not sure this is correct for
>> Windows x64.
>>
>>
>> Unless I'm misreading it, the new code
>> looks a lot like
>> the old one
>> (including the alignement part). Hence
>> this does not
>> look like a
>> regression caused by David's change.
>>
>>
>> Yes, this seems like an old bug.
>>
>> I also see that this is fixed in our
>> codebase since 2010,
>> exactly the
>> way you propose (with MacroAssembler
>>
>> <http://ld8443:8080/source/s?defs=MacroAssembler&project=dev-hotspot>::call_VM_leaf_base
>>
>>
>>
>>
>> <http://ld8443:8080/source/s?defs=call_VM_leaf_base&project=dev-hotspot>).
>>
>>
>>
>> So, it is already used since five years. I
>> would propose
>> opening a
>> separate bug for this one and we contribute
>> our fix.
>>
>>
>> That sounds good to me too. I also suspect
>> additional code will
>> need to be checked as this stack alignment is
>> performed in a few
>> places in ./cpu/x86/vm/macroAssembler_x86.cpp
>> with no OS
>> specific differences being considered.
>>
>>
>> Looks like MacroAssembler::call_VM_leaf_base() does
>> the "right
>> thing"
>> for stack alignment in the X64 case. Perhaps we
>> need to audit both
>> stack alignment _and_ uses of
>> MacroAssembler::call() just to make
>> sure...
>>
>> Did I mention that all the assembly code stuff
>> gives me a
>> headache? :-)
>>
>> Dan
>>
>>
>>
>>
>> Thanks,
>> David
>>
>> Regards, Thomas
>>
>>
>>
>> https://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.90).aspx
>>
>> windows x64 Abi requires the
>> register parameter
>> area right above the
>> return address on the stack. It
>> is not
>> guaranteed to be
>> preserved during
>> the function call:
>>
>> "Even if the called function has
>> fewer than 4
>> parameters, these
>> 4 stack
>> locations are effectively owned by
>> the called
>> function, and may
>> be used
>> by the called function for other
>> purposes besides
>> saving parameter
>> register values. Thus the caller
>> may not save
>> information in
>> this region
>> of stack across a function call."
>>
>> So, restoring parameters from it
>> may not work if
>> Thread::current() uses
>> this area for any reason. In this
>> case, r9, rsp,
>> r10, r11 may be
>> overwritten by the callee.
>>
>> I also think Windows x64 ABI is
>> different enough
>> from Linux that
>> this
>> function could live somewhere in
>> <os>_<cpu>.
>>
>>
>> This is a valid point. However, note
>> that this is
>> similar to the
>> issue we have for instance in 'shared'
>> MacroAssembler::call_VM_leaf_base,
>> which was not put in
>> <os>_<cpu>
>> files.
>>
>> In fact, the right fix may be to use
>> call_VM_leaf
>> instead of call :-)
>>
>> (alternatively we could use "#ifdef
>> _WIN64" or check
>> "frame::arg_reg_save_area_bytes != 0",
>> which may be
>> even cleaner and
>> more portable)
>>
>> All these solutions are fine for me.
>>
>>
>> Since we are near to feature freeze,
>> this should
>> probably be done in
>> a separate CR (unless David has enough
>> time to test
>> it).
>>
>> Regards,
>>
>> Bertrand.
>>
>>
>> Kind Regards, Thomas
>>
>>
>>
>>
>> On Mon, Nov 23, 2015 at 10:42 PM,
>> David Holmes
>> <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>>>> wrote:
>>
>> On 23/11/2015 11:46 PM,
>> Bertrand Delsart
>> wrote:
>>
>> On 23/11/2015 13:27,
>> David Holmes wrote:
>>
>> Hi Bertrand,
>>
>> On 23/11/2015 8:14
>> PM, Bertrand
>> Delsart wrote:
>>
>> Hi David,
>>
>> Overall looks
>> good.
>> Thanks for the
>> #ifndef
>> USE_LIBRARY_BASED_TLS_ONLY :-)
>>
>> One doubt, in
>> case this has not
>> been discussed
>> before.
>>
>> I'm still
>> catching up on x86_64
>> (after years of ARM
>> assembly :-))
>> but it
>> seems that there
>> are some stack
>> alignment
>> constraints on
>> runtime calls,
>> at least for some
>> x86_64 ABIs.
>>
>> Some of the x86
>> MacroAssembler::get_thread
>> implementations
>> had code to
>> align the stack
>> before calling
>> pthread_getspecific. See
>> for instance
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>
>>
>>
>>
>>
>> Sorry I'm not that
>> familiar with the
>> MacroAssembler
>> - is it
>> this odd
>> fragment:
>>
>> - push(r10);
>> - // XXX
>> - mov(r10, rsp);
>> - andq(rsp, -16);
>> - push(r10);
>>
>> I'm not at all clear
>> what that is
>> doing - and if it
>> somehow
>> changes the
>> alignment wouldn't
>> something need to
>> be fixed up when
>> popping the
>> previous values ??
>>
>> To be honest I'm not
>> even sure what an
>> "unaligned
>> stack" means.
>>
>>
>> On some platforms, SP may
>> have to be
>> aligned on a 16 bytes
>> boundary when
>> calling another method.
>>
>>
>> Okay - that seems to be an
>> x64 ABI requirement
>> based on
>> other code
>> in MacroAssembler. The reason
>> it is missing in
>> the Solaris
>> code is
>> because I already removed it
>> in the earlier
>> changes that
>> were done
>> on Solaris:
>>
>>
>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>
>> and nobody picked it up. :(
>>
>> That said, the absence of
>> this code has not
>> caused any
>> problems so
>> presumably we are normally
>> aligned.
>>
>> After having pushed what
>> needed to be
>> saved, the code above
>> rounds SP
>> down and saves the old
>> value. It will then
>> also push
>> r11, which
>> results
>> in the expected alignment.
>>
>> The conterpart, after the
>> VM call, is the:
>> pop(r11);
>> pop(rsp);
>>
>>
>> I had assumed this extra
>> manipulation was
>> related to
>> pushing the arg
>> for the call.
>>
>> This alignment is
>> no longer
>> performed in the new
>> (shared)
>> implementation
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>
>>
>>
>>
>>
>>
>> Now, Solaris was
>> not performing
>> the alignment and
>> Windows has a
>> separate
>> path for x86_64.
>> Did we really
>> need the
>> alignment for
>> linux x86_64 and
>> bsd_x86_64 ?
>> Might it be needed
>> for other ports ?
>>
>> IMHO, it might be
>> safer to align
>> the stack by
>> default,
>> knowing it should
>> not be expensive
>> since we call
>> get_thread
>> rarely for
>> x86_64 (result is
>> cached in r15).
>> I'll let you see
>> whether it is
>> worth
>> adding an ifdef so
>> as to explicitly
>> deactivate the
>> alignment on
>> some platforms
>> (solaris_x86_64 ?)
>>
>>
>> I don't have enough
>> knowledge to even
>> begin to
>> comment on
>> this. I will
>> have to rely on our
>> x86_64
>> macroassembler experts
>> to explain
>> the old
>> code and what the
>> requirements are.
>>
>>
>> OK. This means that the
>> alignment was not
>> removed
>> purposefully.
>>
>> In that case, you must
>> either keep the per
>> platform
>> code x86_64
>> (since
>> it was not identical for
>> all platforms) or
>> use the safest
>> version, with
>> the additional
>>
>> // XXX
>> mov(r10, rsp);
>> andq(rsp, -16);
>> push(r10);
>>
>> before the push(r11) and
>> with the
>>
>> pop(rsp);
>>
>> after the pop(r11). It
>> should work on all
>> x86_64 platforms.
>>
>>
>> Right, I will add this back
>> to the code - and
>> replace the
>> meaningless // XXX with an
>> actual comment!
>>
>> Many thanks,
>> David
>> -----
>>
>>
>> FYI, there is another way
>> to do the
>> alignment, based on
>> the fact
>> that
>> we are at least aligned
>> on 8 bytes (see
>> MacroAssembler::call_VM_leaf_base). I
>> assume that this
>> second
>> version is
>> more efficient
>> (particularly thanks to
>> specilative
>> branches) but it
>> might be safer/simpler to
>> continue using
>> andq in
>> get_thread.
>>
>> Bertrand.
>>
>>
>> Thanks,
>> David
>>
>> Regards,
>>
>> Bertrand.
>>
>>
>>
>> On 23/11/2015
>> 08:03, David Holmes
>> wrote:
>>
>> After all the
>> preliminary
>> discussions here
>> are final
>> proposed
>> changes:
>>
>> bug:
>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>
>> Open webrev:
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>
>> A simple (in
>> principle) but
>> wide-ranging change
>> which should
>> appeal to
>> our Code
>> Deletion Engineer's.
>> We implement
>>
>> Thread::current() using a
>> compiler/language-based
>> thread-local
>> variable eg:
>>
>> static
>> __thread Thread
>> *_thr_current;
>>
>> inline
>> Thread*
>> Thread::current() {
>> return
>> _thr_current;
>> }
>>
>> with an
>> appropriate setter of
>> course. By
>> doing this
>> we can
>> completely
>> remove the
>> os_cpu-specific
>> ThreadLocalStorage
>>
>> implementations, and the
>> associated
>> os::thread_local_storage* calls.
>>
>> As __thread
>> is not
>> async-signal-safe (either in
>> theory or
>> practice) we
>> need a
>> fallback library-based
>> solution as used
>> today. For
>> this we use a
>> very simple
>> ThreadLocalStorage
>> class and an
>>
>> implementation thereof for
>> POSIX (covers
>> AIX, BSD, Linux,
>> OSX and
>> Solaris) using
>> pthread_get/setspecific; and
>> one for
>> Windows using
>> its TLS
>> library.
>> While these
>> library routines
>> are not guaranteed
>>
>> async-signal-safe, they
>> seem to be in
>> practice and are
>> what we have
>> been
>> using all
>> along.
>>
>> We also allow
>> for use of
>> only the
>> library-based TLS
>> for platforms
>> where
>>
>> compiler-based thread locals
>> are not
>> supported (as
>> will be
>> needed in
>> the
>> Mobile
>> project). This is
>> enabled at build
>> time by
>> defining
>> USE_LIBRARY_BASED_TLS_ONLY.
>>
>> Thanks to
>> Andrew Haley for
>> providing the
>> Aarch64
>> code; and for
>> Thomas
>> Stuefe for
>> testing PPC and
>> AIX.
>>
>> Testing:
>> - JPRT
>> (core platforms)
>> - Jtreg
>> tests (linux &
>> solaris)
>> -
>> vm.runtime (core
>> platforms)
>>
>> Performance:
>> - still
>> TBD - this is
>> proving to be
>> extremely
>> difficult. If
>> anyone
>> has
>> any simple
>> to run
>> microbenchmarks to
>> suggest I'd
>> give them a
>> try as a
>> sanity check.
>> But I lack
>> access to hardware for
>> running
>> serious
>> benchmarking.
>>
>> Footprint:
>> - varies by
>> platform and the
>> VM (server,
>> client,
>> minimal)
>> - worst-case:
>> ~1% increase for
>> server VM
>> and minimal VM
>> - best-case:
>> 0.4% decrease
>> for client VM
>>
>> Thanks,
>> David
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Bertrand Delsart,
>> Grenoble
>> Engineering Center
>> Oracle, 180 av. de l'Europe,
>> ZIRST de
>> Montbonnot
>> 38330 Montbonnot Saint Martin, FRANCE
>> bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>
>> <mailto:bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>>
>> <mailto:bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>
>> <mailto:bertrand.delsart at oracle.com
>> <mailto:bertrand.delsart at oracle.com>>>
>> Phone : +33 4 76 18 81 23
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> NOTICE: This email message is for the
>> sole use of the
>> intended
>> recipient(s) and may contain
>> confidential and
>> privileged
>> information. Any unauthorized
>> review, use,
>> disclosure or
>> distribution is prohibited. If you are
>> not the intended
>> recipient,
>> please contact the sender by reply
>> email and destroy
>> all copies of
>> the original message.
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
>>
>>
--
Bertrand Delsart, Grenoble Engineering Center
Oracle, 180 av. de l'Europe, ZIRST de Montbonnot
38330 Montbonnot Saint Martin, FRANCE
bertrand.delsart at oracle.com Phone : +33 4 76 18 81 23
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of
the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the hotspot-dev
mailing list