RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
David Holmes
david.holmes at oracle.com
Fri Nov 27 07:26:25 UTC 2015
Hi Thomas,
On 26/11/2015 7:56 PM, Thomas Stüfe wrote:
>
> On Thu, Nov 26, 2015 at 12:51 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> On 26/11/2015 9:08 AM, Daniel D. Daugherty wrote:
>
> If I understand the problem... :-)
>
> The Win-X64 ABI requires that some extra space be allocated on
> the stack in the caller for "homing" four registers...
>
> This code uses MacroAssembler::call() which is a lower level
> call() function that assumes you know the ABI...
>
> 9765 void MacroAssembler::get_thread(Register thread) {
> <snip>
> 9784 call(RuntimeAddress(CAST_FROM_FN_PTR(address,
> Thread::current)));
>
> So the failing here was nothing to do with register saving per-se,
> nor the alignment code, but "simply" not also saving the extra space
> prior to the call!
>
> Yes. These 32 byte are a scratch area for the callee, which may be used
> for parameter registers or anything else. We may just have been lucky so
> far that TlsGetValue() did not use this area. As Bertrand already said,
> now we call Thread::current(), which does magic tls accesses under the
> hood, so this must be tested again.
Right. All testing so far has been successful but ...
> However I think that a simple addition to your current fix could simply
> be subtracting 32 from rsp before the call:
>
> subq(rsp, 32)
> call(RuntimeAddress(CAST_FROM_FN_PTR(address, Thread::current)));
> addq(rsp, 32)
... I decided to do what was previously suggested and use:
call_VM_leaf_base(CAST_FROM_FN_PTR(address, Thread::current), 0);
as it takes care of everything (stack alignment if needed and preserving
space for the register args on win64).
> Also, we could probably skip saving/restoring rdi and rsi on Windows:
Maybe - but we don't seem to consider that anywhere else ie:
void MacroAssembler::push_callee_saved_registers() {
push(rsi);
push(rdi);
push(rdx);
push(rcx);
}
so I don't want to be the trail-blazer here. :)
Updated webrev:
http://cr.openjdk.java.net/~dholmes/8132510/webrev.v10
Only change to macroAssembler_x86.cpp. Re-tested on Windows 32-bit and
64-bit.
Thanks,
David
> https://msdn.microsoft.com/en-us/library/6t169e9c.aspx
>
> "The registers RAX, RCX, RDX, R8, R9, R10, R11 are considered volatile
> and must be considered destroyed on function calls (unless otherwise
> safety-provable by analysis such as whole program optimization). The
> registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15 are considered
> nonvolatile and must be saved and restored by a function that uses them."
> Best Regards, Thomas
>
>
> If you switch to MacroAssembler::call_VM_leaf_base() it handles
> more of the gory details for you than MacroAssembler::call():
>
>
> Wow! Not exactly a clearly-defined name for this. AFAICS there's
> nothing VM or leaf specific about this. Also the MacroAssembler is
> supposed to be a platform-agnostic higher-level abstraction that is
> easy to use - but in this case every use of the plain call on
> Windows-x64 is potentially broken if the programmer was not aware of
> this ABI constraint which only applies to Windows (and we're in
> OS-neutral files!).
>
> This seems fundamentally broken to me.
>
> Thanks for the info.
>
> David
>
>
> src/cpu/x86/vm/macroAssembler_x86.cpp:
>
> 524 void MacroAssembler::call_VM_leaf_base(address
> entry_point, int
> num_args
> ) {
> 525 Label L, E;
> 526
> 527 #ifdef _WIN64
> 528 // Windows always allocates space for it's register args
> 529 assert(num_args <= 4, "only register arguments
> supported");
> 530 subq(rsp, frame::arg_reg_save_area_bytes);
> 531 #endif
> 532
> 533 // Align stack if necessary
> 534 testl(rsp, 15);
> 535 jcc(Assembler::zero, L);
> 536
> 537 subq(rsp, 8);
> 538 {
> 539 call(RuntimeAddress(entry_point));
> 540 }
>
> So call_VM_leaf_base() allocates the "homing" space for register
> args for Win-X64 on L530 and it handles any stack alignment before
> calling MacroAssembler::call().
>
> So if you switch to call_VM_leaf_base() you can drop
> the stack alignment code:
>
> 9777 // ensure stack alignment
> 9778 mov(r10, rsp);
> 9779 andq(rsp, -16);
> 9780 push(r10);
>
> and just switch to:
>
> 9777 push(rsp);
>
> And you won't be exposed to potential saved register
> overwrite by the "homing" of any registers in the call
> Thread::current().
>
> I _think_ I have that all correct... :-)
>
> Dan
>
>
> On 11/25/15 3:52 PM, David Holmes wrote:
>
> On 26/11/2015 2:53 AM, Thomas Stüfe wrote:
>
> I created a bug to track this:
>
> https://bugs.openjdk.java.net/browse/JDK-8144059
>
>
> Thanks. I admit I still don't really grok the exact problem
> here. It
> is the alignment code that is the problem right?
>
> 79 mov(r10, rsp);
> 80 andq(rsp, -16);
> 81 push(r10);
>
> ?
>
> Thanks,
> David
>
> Because Davids change is not submitted yet, the bug
> refers to the old
> coding, but stays relevant also after this fix.
>
> Kind Regards, Thomas
>
> On Wed, Nov 25, 2015 at 3:52 PM, Daniel D. Daugherty
> <daniel.daugherty at oracle.com
> <mailto:daniel.daugherty at oracle.com>
> <mailto:daniel.daugherty at oracle.com
> <mailto:daniel.daugherty at oracle.com>>>
> wrote:
>
> On 11/25/15 6:10 AM, David Holmes wrote:
>
> On 25/11/2015 8:46 PM, Thomas Stüfe wrote:
>
>
> On Wed, Nov 25, 2015 at 10:22 AM, Bertrand
> Delsart
> <bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>
> <mailto:bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>>
> <mailto:bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>
> <mailto:bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>>>> wrote:
>
> Good finding Thomas,
>
> On 25/11/2015 09:40, Thomas Stüfe wrote:
>
> Hi David, Bertrand,
>
> about
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/src/cpu/x86/vm/macroAssembler_x86.cpp.frames.html
>
>
> I am not sure this is correct for
> Windows x64.
>
>
> Unless I'm misreading it, the new code
> looks a lot like
> the old one
> (including the alignement part). Hence
> this does not
> look like a
> regression caused by David's change.
>
>
> Yes, this seems like an old bug.
>
> I also see that this is fixed in our
> codebase since 2010,
> exactly the
> way you propose (with MacroAssembler
> <http://ld8443:8080/source/s?defs=MacroAssembler&project=dev-hotspot>::call_VM_leaf_base
>
>
> <http://ld8443:8080/source/s?defs=call_VM_leaf_base&project=dev-hotspot>).
>
>
> So, it is already used since five years. I
> would propose
> opening a
> separate bug for this one and we contribute
> our fix.
>
>
> That sounds good to me too. I also suspect
> additional code will
> need to be checked as this stack alignment is
> performed in a few
> places in ./cpu/x86/vm/macroAssembler_x86.cpp
> with no OS
> specific differences being considered.
>
>
> Looks like MacroAssembler::call_VM_leaf_base() does
> the "right
> thing"
> for stack alignment in the X64 case. Perhaps we
> need to audit both
> stack alignment _and_ uses of
> MacroAssembler::call() just to make
> sure...
>
> Did I mention that all the assembly code stuff
> gives me a
> headache? :-)
>
> Dan
>
>
>
>
> Thanks,
> David
>
> Regards, Thomas
>
>
> https://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.90).aspx
>
> windows x64 Abi requires the
> register parameter
> area right above the
> return address on the stack. It is not
> guaranteed to be
> preserved during
> the function call:
>
> "Even if the called function has
> fewer than 4
> parameters, these
> 4 stack
> locations are effectively owned by
> the called
> function, and may
> be used
> by the called function for other
> purposes besides
> saving parameter
> register values. Thus the caller
> may not save
> information in
> this region
> of stack across a function call."
>
> So, restoring parameters from it
> may not work if
> Thread::current() uses
> this area for any reason. In this
> case, r9, rsp,
> r10, r11 may be
> overwritten by the callee.
>
> I also think Windows x64 ABI is
> different enough
> from Linux that
> this
> function could live somewhere in
> <os>_<cpu>.
>
>
> This is a valid point. However, note
> that this is
> similar to the
> issue we have for instance in 'shared'
> MacroAssembler::call_VM_leaf_base,
> which was not put in
> <os>_<cpu>
> files.
>
> In fact, the right fix may be to use
> call_VM_leaf
> instead of call :-)
>
> (alternatively we could use "#ifdef
> _WIN64" or check
> "frame::arg_reg_save_area_bytes != 0",
> which may be
> even cleaner and
> more portable)
>
> All these solutions are fine for me.
>
>
> Since we are near to feature freeze,
> this should
> probably be done in
> a separate CR (unless David has enough
> time to test
> it).
>
> Regards,
>
> Bertrand.
>
>
> Kind Regards, Thomas
>
>
>
>
> On Mon, Nov 23, 2015 at 10:42 PM,
> David Holmes
> <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>>> wrote:
>
> On 23/11/2015 11:46 PM,
> Bertrand Delsart
> wrote:
>
> On 23/11/2015 13:27,
> David Holmes wrote:
>
> Hi Bertrand,
>
> On 23/11/2015 8:14
> PM, Bertrand
> Delsart wrote:
>
> Hi David,
>
> Overall looks good.
> Thanks for the
> #ifndef
> USE_LIBRARY_BASED_TLS_ONLY :-)
>
> One doubt, in
> case this has not
> been discussed
> before.
>
> I'm still
> catching up on x86_64
> (after years of ARM
> assembly :-)) but it
> seems that there
> are some stack
> alignment
> constraints on
> runtime calls,
> at least for some
> x86_64 ABIs.
>
> Some of the x86
> MacroAssembler::get_thread
> implementations
> had code to
> align the stack
> before calling
> pthread_getspecific. See
> for instance
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>
>
>
>
> Sorry I'm not that
> familiar with the
> MacroAssembler
> - is it
> this odd
> fragment:
>
> - push(r10);
> - // XXX
> - mov(r10, rsp);
> - andq(rsp, -16);
> - push(r10);
>
> I'm not at all clear
> what that is
> doing - and if it
> somehow
> changes the
> alignment wouldn't
> something need to
> be fixed up when
> popping the
> previous values ??
>
> To be honest I'm not
> even sure what an
> "unaligned
> stack" means.
>
>
> On some platforms, SP may
> have to be
> aligned on a 16 bytes
> boundary when
> calling another method.
>
>
> Okay - that seems to be an
> x64 ABI requirement
> based on
> other code
> in MacroAssembler. The reason
> it is missing in
> the Solaris
> code is
> because I already removed it
> in the earlier
> changes that
> were done
> on Solaris:
>
> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>
> and nobody picked it up. :(
>
> That said, the absence of
> this code has not
> caused any
> problems so
> presumably we are normally
> aligned.
>
> After having pushed what
> needed to be
> saved, the code above
> rounds SP
> down and saves the old
> value. It will then
> also push
> r11, which
> results
> in the expected alignment.
>
> The conterpart, after the
> VM call, is the:
> pop(r11);
> pop(rsp);
>
>
> I had assumed this extra
> manipulation was
> related to
> pushing the arg
> for the call.
>
> This alignment is
> no longer
> performed in the new
> (shared)
> implementation
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>
>
>
>
>
> Now, Solaris was
> not performing
> the alignment and
> Windows has a
> separate
> path for x86_64.
> Did we really
> need the
> alignment for
> linux x86_64 and
> bsd_x86_64 ?
> Might it be needed
> for other ports ?
>
> IMHO, it might be
> safer to align
> the stack by
> default,
> knowing it should
> not be expensive
> since we call
> get_thread
> rarely for
> x86_64 (result is
> cached in r15).
> I'll let you see
> whether it is
> worth
> adding an ifdef so
> as to explicitly
> deactivate the
> alignment on
> some platforms
> (solaris_x86_64 ?)
>
>
> I don't have enough
> knowledge to even
> begin to
> comment on
> this. I will
> have to rely on our
> x86_64
> macroassembler experts
> to explain
> the old
> code and what the
> requirements are.
>
>
> OK. This means that the
> alignment was not
> removed
> purposefully.
>
> In that case, you must
> either keep the per
> platform
> code x86_64
> (since
> it was not identical for
> all platforms) or
> use the safest
> version, with
> the additional
>
> // XXX
> mov(r10, rsp);
> andq(rsp, -16);
> push(r10);
>
> before the push(r11) and
> with the
>
> pop(rsp);
>
> after the pop(r11). It
> should work on all
> x86_64 platforms.
>
>
> Right, I will add this back
> to the code - and
> replace the
> meaningless // XXX with an
> actual comment!
>
> Many thanks,
> David
> -----
>
>
> FYI, there is another way
> to do the
> alignment, based on
> the fact
> that
> we are at least aligned
> on 8 bytes (see
> MacroAssembler::call_VM_leaf_base). I
> assume that this
> second
> version is
> more efficient
> (particularly thanks to
> specilative
> branches) but it
> might be safer/simpler to
> continue using
> andq in
> get_thread.
>
> Bertrand.
>
>
> Thanks,
> David
>
> Regards,
>
> Bertrand.
>
>
>
> On 23/11/2015
> 08:03, David Holmes
> wrote:
>
> After all the
> preliminary
> discussions here
> are final
> proposed changes:
>
> bug:
> https://bugs.openjdk.java.net/browse/JDK-8132510
>
> Open webrev:
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>
> A simple (in
> principle) but
> wide-ranging change
> which should
> appeal to
> our Code
> Deletion Engineer's.
> We implement
>
> Thread::current() using a
> compiler/language-based
> thread-local
> variable eg:
>
> static
> __thread Thread
> *_thr_current;
>
> inline Thread*
> Thread::current() {
> return
> _thr_current;
> }
>
> with an
> appropriate setter of
> course. By
> doing this
> we can completely
> remove the
> os_cpu-specific
> ThreadLocalStorage
>
> implementations, and the
> associated
> os::thread_local_storage* calls.
>
> As __thread
> is not
> async-signal-safe (either in
> theory or
> practice) we
> need a
> fallback library-based
> solution as used
> today. For
> this we use a
> very simple
> ThreadLocalStorage
> class and an
>
> implementation thereof for
> POSIX (covers
> AIX, BSD, Linux,
> OSX and
> Solaris) using
> pthread_get/setspecific; and
> one for
> Windows using
> its TLS library.
> While these
> library routines
> are not guaranteed
>
> async-signal-safe, they
> seem to be in
> practice and are
> what we have
> been
> using all along.
>
> We also allow
> for use of
> only the
> library-based TLS
> for platforms
> where
>
> compiler-based thread locals
> are not
> supported (as
> will be needed in
> the
> Mobile
> project). This is
> enabled at build
> time by
> defining
> USE_LIBRARY_BASED_TLS_ONLY.
>
> Thanks to
> Andrew Haley for
> providing the
> Aarch64
> code; and for
> Thomas
> Stuefe for
> testing PPC and
> AIX.
>
> Testing:
> - JPRT
> (core platforms)
> - Jtreg
> tests (linux &
> solaris)
> -
> vm.runtime (core
> platforms)
>
> Performance:
> - still
> TBD - this is
> proving to be
> extremely
> difficult. If
> anyone
> has
> any simple to run
> microbenchmarks to
> suggest I'd
> give them a
> try as a
> sanity check.
> But I lack
> access to hardware for
> running serious
> benchmarking.
>
> Footprint:
> - varies by
> platform and the
> VM (server,
> client,
> minimal)
> - worst-case:
> ~1% increase for
> server VM
> and minimal VM
> - best-case:
> 0.4% decrease
> for client VM
>
> Thanks,
> David
>
>
>
>
>
>
>
>
> --
> Bertrand Delsart,
> Grenoble
> Engineering Center
> Oracle, 180 av. de l'Europe,
> ZIRST de
> Montbonnot
> 38330 Montbonnot Saint Martin, FRANCE
> bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>
> <mailto:bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>>
> <mailto:bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>
> <mailto:bertrand.delsart at oracle.com
> <mailto:bertrand.delsart at oracle.com>>>
> Phone : +33 4 76 18 81 23
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE: This email message is for the
> sole use of the
> intended
> recipient(s) and may contain
> confidential and
> privileged
> information. Any unauthorized review, use,
> disclosure or
> distribution is prohibited. If you are
> not the intended
> recipient,
> please contact the sender by reply
> email and destroy
> all copies of
> the original message.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
>
>
More information about the hotspot-dev
mailing list