RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Nov 25 01:34:37 UTC 2015
On 11/23/15 7:19 PM, David Holmes wrote:
> Sorry about that:
>
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
Please double check the copyright updates before you push.
make/aix/makefiles/xlc.make
No comments.
src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
No comments.
src/cpu/sparc/vm/macroAssembler_sparc.cpp
No comments.
src/cpu/sparc/vm/stubRoutines_sparc.cpp
No comments.
src/cpu/x86/vm/macroAssembler_x86.cpp
No comments.
src/os/aix/vm/os_aix.cpp
No comments.
src/os/aix/vm/os_aix.inline.hpp
No comments.
src/os/bsd/vm/os_bsd.cpp
No comments.
src/os/bsd/vm/os_bsd.inline.hpp
No comments.
src/os/linux/vm/os_linux.cpp
No comments.
src/os/linux/vm/os_linux.inline.hpp
No comments.
src/os/solaris/vm/os_solaris.cpp
No comments.
src/os/windows/vm/os_windows.cpp
L5998: os::os_exception_wrapper( (java_call_t)call_wrapper_dummy,
Wrong indent; should only be two spaces.
Please delete space after first '('.
L5999: NULL, NULL, NULL, NULL);
Indent will have to change to match any fixes on previous line.
src/os/windows/vm/os_windows.hpp
L127: static inline void set_thread_ptr_offset( int offset ) {
Please delete space after '(' and before ')'.
src/os_cpu/aix_ppc/vm/os_aix_ppc.cpp
No comments.
src/os_cpu/bsd_x86/vm/assembler_bsd_x86.cpp
No comments.
src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
No comments.
src/os_cpu/bsd_zero/vm/assembler_bsd_zero.cpp
No comments.
src/os_cpu/bsd_zero/vm/os_bsd_zero.cpp
No comments.
src/os_cpu/linux_aarch64/vm/assembler_linux_aarch64.cpp
No comments.
src/os_cpu/linux_aarch64/vm/os_linux_aarch64.cpp
No comments.
src/os_cpu/linux_aarch64/vm/thread_linux_aarch64.hpp
No comments.
src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp
No comments.
src/os_cpu/linux_sparc/vm/os_linux_sparc.cpp
No comments.
src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp
No comments.
src/os_cpu/linux_x86/vm/os_linux_x86.cpp
No comments.
src/os_cpu/linux_zero/vm/assembler_linux_zero.cpp
No comments.
src/os_cpu/linux_zero/vm/os_linux_zero.cpp
No comments.
src/os_cpu/solaris_sparc/vm/os_solaris_sparc.cpp
No comments.
src/os_cpu/solaris_x86/vm/assembler_solaris_x86.cpp
No comments.
src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
No comments.
src/os_cpu/windows_x86/vm/assembler_windows_x86.cpp
No comments.
src/os_cpu/windows_x86/vm/os_windows_x86.cpp
No comments.
src/share/vm/code/nmethod.cpp
No comments.
src/share/vm/gc/cms/concurrentMarkSweepThread.cpp
No comments.
src/share/vm/gc/g1/g1HotCardCache.hpp
No comments.
src/share/vm/gc/parallel/gcTaskThread.cpp
No comments.
Note: blank first line in this file (not your change)
src/share/vm/gc/shared/concurrentGCThread.cpp
No comments.
src/share/vm/gc/shared/workgroup.cpp
No comments.
src/share/vm/memory/allocation.cpp
No comments.
src/share/vm/memory/resourceArea.hpp
No comments.
src/share/vm/oops/oopsHierarchy.cpp
No comments.
src/share/vm/precompiled/precompiled.hpp
No comments.
src/share/vm/prims/jni.cpp
L4312: // If the thread has already been detached the operations
is a no-op
Typo: 'the operations is a' -> 'the operation is a'
src/share/vm/prims/jniCheck.cpp
No comments.
src/share/vm/prims/jvmtiEnter.xsl
No comments.
src/share/vm/prims/jvmtiExport.cpp
L377: JavaThread* current_thread = JavaThread::current();
Wrong indent; two less spaces
src/share/vm/prims/jvmtiUtil.hpp
No comments.
src/share/vm/runtime/interfaceSupport.cpp
No comments.
src/share/vm/runtime/interfaceSupport.hpp
No comments.
src/share/vm/runtime/java.cpp
No comments.
src/share/vm/runtime/mutex.cpp
No comments.
src/share/vm/runtime/mutexLocker.cpp
No comments.
src/share/vm/runtime/os.cpp
No comments.
src/share/vm/runtime/os.hpp
No comments.
src/share/vm/runtime/sharedRuntime.hpp
No comments.
src/share/vm/runtime/thread.cpp
No comments.
src/share/vm/runtime/thread.hpp
No comments.
src/share/vm/runtime/thread.inline.hpp
No comments.
src/share/vm/runtime/threadLocalStorage.hpp
L37: // Platforms without compiler-based TLS (ie __thread
storage-class modifier)
Typo: 'ie' -> 'i.e.'
src/share/vm/runtime/vmThread.cpp
No comments.
src/share/vm/runtime/vm_operations.cpp
(old) L396: Thread * thr_cur = ThreadLocalStorage::get_thread_slow();
(new) L396: Thread * thr_cur = Thread::current();
I think this should be 'current_or_null()' like L374.
src/share/vm/utilities/debug.cpp
No comments.
src/share/vm/utilities/events.cpp
No comments.
src/share/vm/utilities/events.hpp
No comments.
src/share/vm/utilities/globalDefinitions_gcc.hpp
No comments.
src/share/vm/utilities/globalDefinitions_sparcWorks.hpp
No comments.
src/share/vm/utilities/globalDefinitions_visCPP.hpp
No comments.
src/share/vm/utilities/globalDefinitions_xlc.hpp
No comments.
src/share/vm/utilities/ostream.cpp
No comments.
src/os/posix/vm/threadLocalStorage_posix.cpp
L38: ThreadLocalStorage::set_thread((Thread*) p);
Wrong indent; should be two spaces.
src/os/windows/vm/threadLocalStorage_windows.cpp
L54: assert(current != 0 || GetLastError() == ERROR_SUCCESS, '
'current != 0' -> 'current != NULL'
jcheck will complain about the blank at the end of this line
L62: assert(res, "TlsSetValue failed with error code: %lu",
GetLastError());
jcheck will complain about the blank at the end of this line
src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.s
No comments.
No comments on the deleted files.
Dan
>
> David
>
> On 24/11/2015 11:17 AM, David Holmes wrote:
>> Updated webrev:
>>
>> cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>
>> Only change is in macroAssembler_x86.cpp
>>
>> Thanks,
>> David
>>
>> On 24/11/2015 7:42 AM, David Holmes wrote:
>>> On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>>> On 23/11/2015 13:27, David Holmes wrote:
>>>>> Hi Bertrand,
>>>>>
>>>>> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Overall looks good.
>>>>>> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>>>>>
>>>>>> One doubt, in case this has not been discussed before.
>>>>>>
>>>>>> I'm still catching up on x86_64 (after years of ARM assembly :-))
>>>>>> but it
>>>>>> seems that there are some stack alignment constraints on runtime
>>>>>> calls,
>>>>>> at least for some x86_64 ABIs.
>>>>>>
>>>>>> Some of the x86 MacroAssembler::get_thread implementations had
>>>>>> code to
>>>>>> align the stack before calling pthread_getspecific. See for instance
>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Sorry I'm not that familiar with the MacroAssembler - is it this odd
>>>>> fragment:
>>>>>
>>>>> - push(r10);
>>>>> - // XXX
>>>>> - mov(r10, rsp);
>>>>> - andq(rsp, -16);
>>>>> - push(r10);
>>>>>
>>>>> I'm not at all clear what that is doing - and if it somehow
>>>>> changes the
>>>>> alignment wouldn't something need to be fixed up when popping the
>>>>> previous values ??
>>>>>
>>>>> To be honest I'm not even sure what an "unaligned stack" means.
>>>>
>>>> On some platforms, SP may have to be aligned on a 16 bytes boundary
>>>> when
>>>> calling another method.
>>>
>>> Okay - that seems to be an x64 ABI requirement based on other code in
>>> MacroAssembler. The reason it is missing in the Solaris code is because
>>> I already removed it in the earlier changes that were done on Solaris:
>>>
>>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>>
>>> and nobody picked it up. :(
>>>
>>> That said, the absence of this code has not caused any problems so
>>> presumably we are normally aligned.
>>>
>>>> After having pushed what needed to be saved, the code above rounds SP
>>>> down and saves the old value. It will then also push r11, which
>>>> results
>>>> in the expected alignment.
>>>>
>>>> The conterpart, after the VM call, is the:
>>>> pop(r11);
>>>> pop(rsp);
>>>
>>> I had assumed this extra manipulation was related to pushing the arg
>>> for
>>> the call.
>>>
>>>>>> This alignment is no longer performed in the new (shared)
>>>>>> implementation
>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Now, Solaris was not performing the alignment and Windows has a
>>>>>> separate
>>>>>> path for x86_64. Did we really need the alignment for linux
>>>>>> x86_64 and
>>>>>> bsd_x86_64 ? Might it be needed for other ports ?
>>>>>>
>>>>>> IMHO, it might be safer to align the stack by default, knowing it
>>>>>> should
>>>>>> not be expensive since we call get_thread rarely for x86_64
>>>>>> (result is
>>>>>> cached in r15). I'll let you see whether it is worth adding an
>>>>>> ifdef so
>>>>>> as to explicitly deactivate the alignment on some platforms
>>>>>> (solaris_x86_64 ?)
>>>>>
>>>>> I don't have enough knowledge to even begin to comment on this. I
>>>>> will
>>>>> have to rely on our x86_64 macroassembler experts to explain the old
>>>>> code and what the requirements are.
>>>>
>>>> OK. This means that the alignment was not removed purposefully.
>>>>
>>>> In that case, you must either keep the per platform code x86_64 (since
>>>> it was not identical for all platforms) or use the safest version,
>>>> with
>>>> the additional
>>>>
>>>> // XXX
>>>> mov(r10, rsp);
>>>> andq(rsp, -16);
>>>> push(r10);
>>>>
>>>> before the push(r11) and with the
>>>>
>>>> pop(rsp);
>>>>
>>>> after the pop(r11). It should work on all x86_64 platforms.
>>>
>>> Right, I will add this back to the code - and replace the
>>> meaningless //
>>> XXX with an actual comment!
>>>
>>> Many thanks,
>>> David
>>> -----
>>>
>>>> FYI, there is another way to do the alignment, based on the fact that
>>>> we are at least aligned on 8 bytes (see
>>>> MacroAssembler::call_VM_leaf_base). I assume that this second
>>>> version is
>>>> more efficient (particularly thanks to specilative branches) but it
>>>> might be safer/simpler to continue using andq in get_thread.
>>>>
>>>> Bertrand.
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Bertrand.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23/11/2015 08:03, David Holmes wrote:
>>>>>>> After all the preliminary discussions here are final proposed
>>>>>>> changes:
>>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>
>>>>>>> Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>>>>>>
>>>>>>> A simple (in principle) but wide-ranging change which should
>>>>>>> appeal to
>>>>>>> our Code Deletion Engineer's. We implement Thread::current()
>>>>>>> using a
>>>>>>> compiler/language-based thread-local variable eg:
>>>>>>>
>>>>>>> static __thread Thread *_thr_current;
>>>>>>>
>>>>>>> inline Thread* Thread::current() {
>>>>>>> return _thr_current;
>>>>>>> }
>>>>>>>
>>>>>>> with an appropriate setter of course. By doing this we can
>>>>>>> completely
>>>>>>> remove the os_cpu-specific ThreadLocalStorage implementations, and
>>>>>>> the
>>>>>>> associated os::thread_local_storage* calls.
>>>>>>>
>>>>>>> As __thread is not async-signal-safe (either in theory or
>>>>>>> practice) we
>>>>>>> need a fallback library-based solution as used today. For this we
>>>>>>> use a
>>>>>>> very simple ThreadLocalStorage class and an implementation thereof
>>>>>>> for
>>>>>>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>>>>>> pthread_get/setspecific; and one for Windows using its TLS library.
>>>>>>> While these library routines are not guaranteed async-signal-safe,
>>>>>>> they
>>>>>>> seem to be in practice and are what we have been using all along.
>>>>>>>
>>>>>>> We also allow for use of only the library-based TLS for platforms
>>>>>>> where
>>>>>>> compiler-based thread locals are not supported (as will be
>>>>>>> needed in
>>>>>>> the
>>>>>>> Mobile project). This is enabled at build time by defining
>>>>>>> USE_LIBRARY_BASED_TLS_ONLY.
>>>>>>>
>>>>>>> Thanks to Andrew Haley for providing the Aarch64 code; and for
>>>>>>> Thomas
>>>>>>> Stuefe for testing PPC and AIX.
>>>>>>>
>>>>>>> Testing:
>>>>>>> - JPRT (core platforms)
>>>>>>> - Jtreg tests (linux & solaris)
>>>>>>> - vm.runtime (core platforms)
>>>>>>>
>>>>>>> Performance:
>>>>>>> - still TBD - this is proving to be extremely difficult. If
>>>>>>> anyone
>>>>>>> has
>>>>>>> any simple to run microbenchmarks to suggest I'd give them a try
>>>>>>> as a
>>>>>>> sanity check. But I lack access to hardware for running serious
>>>>>>> benchmarking.
>>>>>>>
>>>>>>> Footprint:
>>>>>>> - varies by platform and the VM (server, client, minimal)
>>>>>>> - worst-case: ~1% increase for server VM and minimal VM
>>>>>>> - best-case: 0.4% decrease for client VM
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>
>>>>>>
>>>>
>>>>
More information about the hotspot-dev
mailing list