RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Nov 25 01:34:37 UTC 2015


On 11/23/15 7:19 PM, David Holmes wrote:
> Sorry about that:
>
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v7/

Please double check the copyright updates before you push.


make/aix/makefiles/xlc.make
     No comments.

src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
     No comments.

src/cpu/sparc/vm/macroAssembler_sparc.cpp
     No comments.

src/cpu/sparc/vm/stubRoutines_sparc.cpp
     No comments.

src/cpu/x86/vm/macroAssembler_x86.cpp
     No comments.

src/os/aix/vm/os_aix.cpp
     No comments.

src/os/aix/vm/os_aix.inline.hpp
     No comments.

src/os/bsd/vm/os_bsd.cpp
     No comments.

src/os/bsd/vm/os_bsd.inline.hpp
     No comments.

src/os/linux/vm/os_linux.cpp
     No comments.

src/os/linux/vm/os_linux.inline.hpp
     No comments.

src/os/solaris/vm/os_solaris.cpp
     No comments.

src/os/windows/vm/os_windows.cpp
     L5998:       os::os_exception_wrapper( (java_call_t)call_wrapper_dummy,
         Wrong indent; should only be two spaces.
         Please delete space after first '('.

     L5999:                                 NULL, NULL, NULL, NULL);
         Indent will have to change to match any fixes on previous line.

src/os/windows/vm/os_windows.hpp
     L127:   static inline void set_thread_ptr_offset( int offset ) {
         Please delete space after '(' and before ')'.

src/os_cpu/aix_ppc/vm/os_aix_ppc.cpp
     No comments.

src/os_cpu/bsd_x86/vm/assembler_bsd_x86.cpp
     No comments.

src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
     No comments.

src/os_cpu/bsd_zero/vm/assembler_bsd_zero.cpp
     No comments.

src/os_cpu/bsd_zero/vm/os_bsd_zero.cpp
     No comments.

src/os_cpu/linux_aarch64/vm/assembler_linux_aarch64.cpp
     No comments.

src/os_cpu/linux_aarch64/vm/os_linux_aarch64.cpp
     No comments.

src/os_cpu/linux_aarch64/vm/thread_linux_aarch64.hpp
     No comments.

src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp
     No comments.

src/os_cpu/linux_sparc/vm/os_linux_sparc.cpp
     No comments.

src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp
     No comments.

src/os_cpu/linux_x86/vm/os_linux_x86.cpp
     No comments.

src/os_cpu/linux_zero/vm/assembler_linux_zero.cpp
     No comments.

src/os_cpu/linux_zero/vm/os_linux_zero.cpp
     No comments.

src/os_cpu/solaris_sparc/vm/os_solaris_sparc.cpp
     No comments.

src/os_cpu/solaris_x86/vm/assembler_solaris_x86.cpp
     No comments.

src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
     No comments.

src/os_cpu/windows_x86/vm/assembler_windows_x86.cpp
     No comments.

src/os_cpu/windows_x86/vm/os_windows_x86.cpp
     No comments.

src/share/vm/code/nmethod.cpp
     No comments.

src/share/vm/gc/cms/concurrentMarkSweepThread.cpp
     No comments.

src/share/vm/gc/g1/g1HotCardCache.hpp
     No comments.

src/share/vm/gc/parallel/gcTaskThread.cpp
     No comments.

     Note: blank first line in this file (not your change)

src/share/vm/gc/shared/concurrentGCThread.cpp
     No comments.

src/share/vm/gc/shared/workgroup.cpp
     No comments.

src/share/vm/memory/allocation.cpp
     No comments.

src/share/vm/memory/resourceArea.hpp
     No comments.

src/share/vm/oops/oopsHierarchy.cpp
     No comments.

src/share/vm/precompiled/precompiled.hpp
     No comments.

src/share/vm/prims/jni.cpp
     L4312:   // If the thread has already been detached the operations 
is a no-op
         Typo: 'the operations is a' -> 'the operation is a'

src/share/vm/prims/jniCheck.cpp
     No comments.

src/share/vm/prims/jvmtiEnter.xsl
     No comments.

src/share/vm/prims/jvmtiExport.cpp
     L377:       JavaThread* current_thread = JavaThread::current();
         Wrong indent; two less spaces

src/share/vm/prims/jvmtiUtil.hpp
     No comments.

src/share/vm/runtime/interfaceSupport.cpp
     No comments.

src/share/vm/runtime/interfaceSupport.hpp
     No comments.

src/share/vm/runtime/java.cpp
     No comments.

src/share/vm/runtime/mutex.cpp
     No comments.

src/share/vm/runtime/mutexLocker.cpp
     No comments.

src/share/vm/runtime/os.cpp
     No comments.

src/share/vm/runtime/os.hpp
     No comments.

src/share/vm/runtime/sharedRuntime.hpp
     No comments.

src/share/vm/runtime/thread.cpp
     No comments.

src/share/vm/runtime/thread.hpp
     No comments.

src/share/vm/runtime/thread.inline.hpp
     No comments.

src/share/vm/runtime/threadLocalStorage.hpp
     L37: // Platforms without compiler-based TLS (ie __thread 
storage-class modifier)
         Typo: 'ie' -> 'i.e.'

src/share/vm/runtime/vmThread.cpp
     No comments.

src/share/vm/runtime/vm_operations.cpp

     (old) L396:   Thread * thr_cur = ThreadLocalStorage::get_thread_slow();
     (new) L396:   Thread * thr_cur = Thread::current();

         I think this should be 'current_or_null()' like L374.

src/share/vm/utilities/debug.cpp
     No comments.

src/share/vm/utilities/events.cpp
     No comments.

src/share/vm/utilities/events.hpp
     No comments.

src/share/vm/utilities/globalDefinitions_gcc.hpp
     No comments.

src/share/vm/utilities/globalDefinitions_sparcWorks.hpp
     No comments.

src/share/vm/utilities/globalDefinitions_visCPP.hpp
     No comments.

src/share/vm/utilities/globalDefinitions_xlc.hpp
     No comments.

src/share/vm/utilities/ostream.cpp
     No comments.

src/os/posix/vm/threadLocalStorage_posix.cpp
     L38:     ThreadLocalStorage::set_thread((Thread*) p);
         Wrong indent; should be two spaces.

src/os/windows/vm/threadLocalStorage_windows.cpp
     L54:   assert(current != 0 || GetLastError() == ERROR_SUCCESS, '
         'current != 0' -> 'current != NULL'
         jcheck will complain about the blank at the end of this line

     L62:   assert(res, "TlsSetValue failed with error code: %lu", 
GetLastError());
         jcheck will complain about the blank at the end of this line

src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.s
     No comments.

No comments on the deleted files.


Dan


>
> David
>
> On 24/11/2015 11:17 AM, David Holmes wrote:
>> Updated webrev:
>>
>> cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>
>> Only change is in macroAssembler_x86.cpp
>>
>> Thanks,
>> David
>>
>> On 24/11/2015 7:42 AM, David Holmes wrote:
>>> On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>>> On 23/11/2015 13:27, David Holmes wrote:
>>>>> Hi Bertrand,
>>>>>
>>>>> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Overall looks good.
>>>>>> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>>>>>
>>>>>> One doubt, in case this has not been discussed before.
>>>>>>
>>>>>> I'm still catching up on x86_64 (after years of ARM assembly :-))
>>>>>> but it
>>>>>> seems that there are some stack alignment constraints on runtime
>>>>>> calls,
>>>>>> at least for some x86_64 ABIs.
>>>>>>
>>>>>> Some of the x86 MacroAssembler::get_thread implementations had 
>>>>>> code to
>>>>>> align the stack before calling pthread_getspecific. See for instance
>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Sorry I'm not that familiar with the MacroAssembler - is it this odd
>>>>> fragment:
>>>>>
>>>>> -   push(r10);
>>>>> -   // XXX
>>>>> -   mov(r10, rsp);
>>>>> -   andq(rsp, -16);
>>>>> -   push(r10);
>>>>>
>>>>> I'm not at all clear what that is doing - and if it somehow 
>>>>> changes the
>>>>> alignment wouldn't something need to be fixed up when popping the
>>>>> previous values ??
>>>>>
>>>>> To be honest I'm not even sure what an "unaligned stack" means.
>>>>
>>>> On some platforms, SP may have to be aligned on a 16 bytes boundary 
>>>> when
>>>> calling another method.
>>>
>>> Okay - that seems to be an x64 ABI requirement based on other code in
>>> MacroAssembler. The reason it is missing in the Solaris code is because
>>> I already removed it in the earlier changes that were done on Solaris:
>>>
>>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>>
>>> and nobody picked it up. :(
>>>
>>> That said, the absence of this code has not caused any problems so
>>> presumably we are normally aligned.
>>>
>>>> After having pushed what needed to be saved, the code above rounds SP
>>>> down and saves the old value. It will then also push r11, which 
>>>> results
>>>> in the expected alignment.
>>>>
>>>> The conterpart, after the VM call, is the:
>>>>   pop(r11);
>>>>   pop(rsp);
>>>
>>> I had assumed this extra manipulation was related to pushing the arg 
>>> for
>>> the call.
>>>
>>>>>> This alignment is no longer performed in the new (shared)
>>>>>> implementation
>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Now, Solaris was not performing the alignment and Windows has a
>>>>>> separate
>>>>>> path for x86_64. Did we really need the alignment for linux 
>>>>>> x86_64 and
>>>>>> bsd_x86_64 ? Might it be needed for other ports ?
>>>>>>
>>>>>> IMHO, it might be safer to align the stack by default, knowing it
>>>>>> should
>>>>>> not be expensive since we call get_thread rarely for x86_64 
>>>>>> (result is
>>>>>> cached in r15). I'll let you see whether it is worth adding an
>>>>>> ifdef so
>>>>>> as to explicitly deactivate the alignment on some platforms
>>>>>> (solaris_x86_64 ?)
>>>>>
>>>>> I don't have enough knowledge to even begin to comment on this. I 
>>>>> will
>>>>> have to rely on our x86_64 macroassembler experts to explain the old
>>>>> code and what the requirements are.
>>>>
>>>> OK. This means that the alignment was not removed purposefully.
>>>>
>>>> In that case, you must either keep the per platform code x86_64 (since
>>>> it was not identical for all platforms) or use the safest version, 
>>>> with
>>>> the additional
>>>>
>>>>     // XXX
>>>>     mov(r10, rsp);
>>>>     andq(rsp, -16);
>>>>     push(r10);
>>>>
>>>> before the push(r11) and with the
>>>>
>>>>     pop(rsp);
>>>>
>>>> after the pop(r11). It should work on all x86_64 platforms.
>>>
>>> Right, I will add this back to the code - and replace the 
>>> meaningless //
>>> XXX with an actual comment!
>>>
>>> Many thanks,
>>> David
>>> -----
>>>
>>>> FYI, there is another way to do the alignment, based on the fact that
>>>> we are at least aligned on 8 bytes (see
>>>> MacroAssembler::call_VM_leaf_base). I assume that this second 
>>>> version is
>>>> more efficient (particularly thanks to specilative branches) but it
>>>> might be safer/simpler to continue using andq in get_thread.
>>>>
>>>> Bertrand.
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Bertrand.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23/11/2015 08:03, David Holmes wrote:
>>>>>>> After all the preliminary discussions here are final proposed
>>>>>>> changes:
>>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>
>>>>>>> Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>>>>>>
>>>>>>> A simple (in principle) but wide-ranging change which should
>>>>>>> appeal to
>>>>>>> our Code Deletion Engineer's. We implement Thread::current() 
>>>>>>> using a
>>>>>>> compiler/language-based thread-local variable eg:
>>>>>>>
>>>>>>>   static __thread Thread *_thr_current;
>>>>>>>
>>>>>>>   inline Thread* Thread::current() {
>>>>>>>     return _thr_current;
>>>>>>>   }
>>>>>>>
>>>>>>> with an appropriate setter of course. By doing this we can 
>>>>>>> completely
>>>>>>> remove the os_cpu-specific ThreadLocalStorage implementations, and
>>>>>>> the
>>>>>>> associated os::thread_local_storage* calls.
>>>>>>>
>>>>>>> As __thread is not async-signal-safe (either in theory or
>>>>>>> practice) we
>>>>>>> need a fallback library-based solution as used today. For this we
>>>>>>> use a
>>>>>>> very simple ThreadLocalStorage class and an implementation thereof
>>>>>>> for
>>>>>>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>>>>>> pthread_get/setspecific; and one for Windows using its TLS library.
>>>>>>> While these library routines are not guaranteed async-signal-safe,
>>>>>>> they
>>>>>>> seem to be in practice and are what we have been using all along.
>>>>>>>
>>>>>>> We also allow for use of only the library-based TLS for platforms
>>>>>>> where
>>>>>>> compiler-based thread locals are not supported (as will be 
>>>>>>> needed in
>>>>>>> the
>>>>>>> Mobile project). This is enabled at build time by defining
>>>>>>> USE_LIBRARY_BASED_TLS_ONLY.
>>>>>>>
>>>>>>> Thanks to Andrew Haley for providing the Aarch64 code; and for 
>>>>>>> Thomas
>>>>>>> Stuefe for testing PPC and AIX.
>>>>>>>
>>>>>>> Testing:
>>>>>>>   - JPRT (core platforms)
>>>>>>>   - Jtreg tests (linux & solaris)
>>>>>>>   - vm.runtime (core platforms)
>>>>>>>
>>>>>>> Performance:
>>>>>>>   - still TBD - this is proving to be extremely difficult. If 
>>>>>>> anyone
>>>>>>> has
>>>>>>> any simple to run microbenchmarks to suggest I'd give them a try 
>>>>>>> as a
>>>>>>> sanity check. But I lack access to hardware for running serious
>>>>>>> benchmarking.
>>>>>>>
>>>>>>> Footprint:
>>>>>>> - varies by platform and the VM (server, client, minimal)
>>>>>>> - worst-case: ~1% increase for server VM and minimal VM
>>>>>>> - best-case:  0.4% decrease for client VM
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>
>>>>>>
>>>>
>>>>



More information about the hotspot-dev mailing list