RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Nov 25 14:52:41 UTC 2015
On 11/25/15 6:10 AM, David Holmes wrote:
> On 25/11/2015 8:46 PM, Thomas Stüfe wrote:
>>
>> On Wed, Nov 25, 2015 at 10:22 AM, Bertrand Delsart
>> <bertrand.delsart at oracle.com <mailto:bertrand.delsart at oracle.com>>
>> wrote:
>>
>> Good finding Thomas,
>>
>> On 25/11/2015 09:40, Thomas Stüfe wrote:
>>
>> Hi David, Bertrand,
>>
>> about
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/src/cpu/x86/vm/macroAssembler_x86.cpp.frames.html
>>
>> I am not sure this is correct for Windows x64.
>>
>>
>> Unless I'm misreading it, the new code looks a lot like the old one
>> (including the alignement part). Hence this does not look like a
>> regression caused by David's change.
>>
>>
>> Yes, this seems like an old bug.
>>
>> I also see that this is fixed in our codebase since 2010, exactly the
>> way you propose (with MacroAssembler
>> <http://ld8443:8080/source/s?defs=MacroAssembler&project=dev-hotspot>::call_VM_leaf_base
>>
>> <http://ld8443:8080/source/s?defs=call_VM_leaf_base&project=dev-hotspot>).
>>
>> So, it is already used since five years. I would propose opening a
>> separate bug for this one and we contribute our fix.
>
> That sounds good to me too. I also suspect additional code will need
> to be checked as this stack alignment is performed in a few places in
> ./cpu/x86/vm/macroAssembler_x86.cpp with no OS specific differences
> being considered.
Looks like MacroAssembler::call_VM_leaf_base() does the "right thing"
for stack alignment in the X64 case. Perhaps we need to audit both
stack alignment _and_ uses of MacroAssembler::call() just to make
sure...
Did I mention that all the assembly code stuff gives me a headache? :-)
Dan
>
> Thanks,
> David
>
>> Regards, Thomas
>>
>>
>> https://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.90).aspx
>>
>> windows x64 Abi requires the register parameter area right
>> above the
>> return address on the stack. It is not guaranteed to be
>> preserved during
>> the function call:
>>
>> "Even if the called function has fewer than 4 parameters, these
>> 4 stack
>> locations are effectively owned by the called function, and may
>> be used
>> by the called function for other purposes besides saving
>> parameter
>> register values. Thus the caller may not save information in
>> this region
>> of stack across a function call."
>>
>> So, restoring parameters from it may not work if
>> Thread::current() uses
>> this area for any reason. In this case, r9, rsp, r10, r11 may be
>> overwritten by the callee.
>>
>> I also think Windows x64 ABI is different enough from Linux that
>> this
>> function could live somewhere in <os>_<cpu>.
>>
>>
>> This is a valid point. However, note that this is similar to the
>> issue we have for instance in 'shared'
>> MacroAssembler::call_VM_leaf_base, which was not put in <os>_<cpu>
>> files.
>>
>> In fact, the right fix may be to use call_VM_leaf instead of call
>> :-)
>>
>> (alternatively we could use "#ifdef _WIN64" or check
>> "frame::arg_reg_save_area_bytes != 0", which may be even cleaner and
>> more portable)
>>
>> All these solutions are fine for me.
>>
>>
>> Since we are near to feature freeze, this should probably be done in
>> a separate CR (unless David has enough time to test it).
>>
>> Regards,
>>
>> Bertrand.
>>
>>
>> Kind Regards, Thomas
>>
>>
>>
>>
>> On Mon, Nov 23, 2015 at 10:42 PM, David Holmes
>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>> wrote:
>>
>> On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>
>> On 23/11/2015 13:27, David Holmes wrote:
>>
>> Hi Bertrand,
>>
>> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>
>> Hi David,
>>
>> Overall looks good.
>> Thanks for the #ifndef
>> USE_LIBRARY_BASED_TLS_ONLY :-)
>>
>> One doubt, in case this has not been discussed
>> before.
>>
>> I'm still catching up on x86_64 (after years
>> of ARM
>> assembly :-)) but it
>> seems that there are some stack alignment
>> constraints on
>> runtime calls,
>> at least for some x86_64 ABIs.
>>
>> Some of the x86 MacroAssembler::get_thread
>> implementations had code to
>> align the stack before calling
>> pthread_getspecific. See
>> for instance
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>
>>
>>
>> Sorry I'm not that familiar with the MacroAssembler
>> - is it
>> this odd
>> fragment:
>>
>> - push(r10);
>> - // XXX
>> - mov(r10, rsp);
>> - andq(rsp, -16);
>> - push(r10);
>>
>> I'm not at all clear what that is doing - and if it
>> somehow
>> changes the
>> alignment wouldn't something need to be fixed up
>> when
>> popping the
>> previous values ??
>>
>> To be honest I'm not even sure what an "unaligned
>> stack" means.
>>
>>
>> On some platforms, SP may have to be aligned on a 16
>> bytes
>> boundary when
>> calling another method.
>>
>>
>> Okay - that seems to be an x64 ABI requirement based on
>> other code
>> in MacroAssembler. The reason it is missing in the Solaris
>> code is
>> because I already removed it in the earlier changes that
>> were done
>> on Solaris:
>>
>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>
>> and nobody picked it up. :(
>>
>> That said, the absence of this code has not caused any
>> problems so
>> presumably we are normally aligned.
>>
>> After having pushed what needed to be saved, the
>> code above
>> rounds SP
>> down and saves the old value. It will then also push
>> r11, which
>> results
>> in the expected alignment.
>>
>> The conterpart, after the VM call, is the:
>> pop(r11);
>> pop(rsp);
>>
>>
>> I had assumed this extra manipulation was related to
>> pushing the arg
>> for the call.
>>
>> This alignment is no longer performed in the
>> new
>> (shared) implementation
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>
>>
>>
>>
>> Now, Solaris was not performing the
>> alignment and
>> Windows has a separate
>> path for x86_64. Did we really need the
>> alignment for
>> linux x86_64 and
>> bsd_x86_64 ? Might it be needed for other
>> ports ?
>>
>> IMHO, it might be safer to align the stack by
>> default,
>> knowing it should
>> not be expensive since we call get_thread
>> rarely for
>> x86_64 (result is
>> cached in r15). I'll let you see whether it is
>> worth
>> adding an ifdef so
>> as to explicitly deactivate the alignment on
>> some platforms
>> (solaris_x86_64 ?)
>>
>>
>> I don't have enough knowledge to even begin to
>> comment on
>> this. I will
>> have to rely on our x86_64 macroassembler experts
>> to explain
>> the old
>> code and what the requirements are.
>>
>>
>> OK. This means that the alignment was not removed
>> purposefully.
>>
>> In that case, you must either keep the per platform
>> code x86_64
>> (since
>> it was not identical for all platforms) or use the
>> safest
>> version, with
>> the additional
>>
>> // XXX
>> mov(r10, rsp);
>> andq(rsp, -16);
>> push(r10);
>>
>> before the push(r11) and with the
>>
>> pop(rsp);
>>
>> after the pop(r11). It should work on all x86_64
>> platforms.
>>
>>
>> Right, I will add this back to the code - and replace the
>> meaningless // XXX with an actual comment!
>>
>> Many thanks,
>> David
>> -----
>>
>>
>> FYI, there is another way to do the alignment, based on
>> the fact
>> that
>> we are at least aligned on 8 bytes (see
>> MacroAssembler::call_VM_leaf_base). I assume that this
>> second
>> version is
>> more efficient (particularly thanks to specilative
>> branches) but it
>> might be safer/simpler to continue using andq in
>> get_thread.
>>
>> Bertrand.
>>
>>
>> Thanks,
>> David
>>
>> Regards,
>>
>> Bertrand.
>>
>>
>>
>> On 23/11/2015 08:03, David Holmes wrote:
>>
>> After all the preliminary discussions here
>> are final
>> proposed changes:
>>
>> bug:
>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>
>> Open webrev:
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>
>> A simple (in principle) but wide-ranging
>> change
>> which should appeal to
>> our Code Deletion Engineer's. We implement
>> Thread::current() using a
>> compiler/language-based thread-local
>> variable eg:
>>
>> static __thread Thread *_thr_current;
>>
>> inline Thread* Thread::current() {
>> return _thr_current;
>> }
>>
>> with an appropriate setter of course. By
>> doing this
>> we can completely
>> remove the os_cpu-specific
>> ThreadLocalStorage
>> implementations, and the
>> associated os::thread_local_storage* calls.
>>
>> As __thread is not async-signal-safe
>> (either in
>> theory or practice) we
>> need a fallback library-based solution
>> as used
>> today. For this we use a
>> very simple ThreadLocalStorage class and an
>> implementation thereof for
>> POSIX (covers AIX, BSD, Linux, OSX and
>> Solaris) using
>> pthread_get/setspecific; and one for
>> Windows using
>> its TLS library.
>> While these library routines are not
>> guaranteed
>> async-signal-safe, they
>> seem to be in practice and are what we have
>> been
>> using all along.
>>
>> We also allow for use of only the
>> library-based TLS
>> for platforms where
>> compiler-based thread locals are not
>> supported (as
>> will be needed in
>> the
>> Mobile project). This is enabled at build
>> time by
>> defining
>> USE_LIBRARY_BASED_TLS_ONLY.
>>
>> Thanks to Andrew Haley for providing the
>> Aarch64
>> code; and for Thomas
>> Stuefe for testing PPC and AIX.
>>
>> Testing:
>> - JPRT (core platforms)
>> - Jtreg tests (linux & solaris)
>> - vm.runtime (core platforms)
>>
>> Performance:
>> - still TBD - this is proving to be
>> extremely
>> difficult. If anyone
>> has
>> any simple to run microbenchmarks to
>> suggest I'd
>> give them a try as a
>> sanity check. But I lack access to
>> hardware for
>> running serious
>> benchmarking.
>>
>> Footprint:
>> - varies by platform and the VM (server,
>> client,
>> minimal)
>> - worst-case: ~1% increase for server VM
>> and minimal VM
>> - best-case: 0.4% decrease for client VM
>>
>> Thanks,
>> David
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Bertrand Delsart, Grenoble Engineering Center
>> Oracle, 180 av. de l'Europe, ZIRST de Montbonnot
>> 38330 Montbonnot Saint Martin, FRANCE
>> bertrand.delsart at oracle.com <mailto:bertrand.delsart at oracle.com>
>> Phone : +33 4 76 18 81 23
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> NOTICE: This email message is for the sole use of the intended
>> recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or
>> distribution is prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of
>> the original message.
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
More information about the hotspot-dev
mailing list