RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Nov 25 14:29:46 UTC 2015


On 11/24/15 10:07 PM, David Holmes wrote:
> Hi Dan,
>
> Many thanks for the meticulous review!

You're welcome.


>
> On 25/11/2015 11:34 AM, Daniel D. Daugherty wrote:
>> On 11/23/15 7:19 PM, David Holmes wrote:
>>> Sorry about that:
>>>
>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>
>> Please double check the copyright updates before you push.
>
> Did you notice something missing? I use:
>
> hg status -man | xargs grep Copyright | grep -v 2015
>
> to find any copyright lines without 2015 present (it turns up the 
> non-Oracle copyrights in a few files).

src/os_cpu/linux_aarch64/vm/thread_linux_aarch64.hpp

L2  * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights 
reserved.

>
> Trimming to files with comments ...
>
>> src/os/windows/vm/os_windows.cpp
>>      L5998:       os::os_exception_wrapper(
>> (java_call_t)call_wrapper_dummy,
>>          Wrong indent; should only be two spaces.
>>          Please delete space after first '('.
>>
>>      L5999:                                 NULL, NULL, NULL, NULL);
>>          Indent will have to change to match any fixes on previous line.
>
> Fixed.
>
>> src/os/windows/vm/os_windows.hpp
>>      L127:   static inline void set_thread_ptr_offset( int offset ) {
>>          Please delete space after '(' and before ')'.
>
> Fixed (was a copy of code from now deleted file)
>
>> src/share/vm/gc/parallel/gcTaskThread.cpp
>>      No comments.
>>
>>      Note: blank first line in this file (not your change)
>
> Fixed.
>
>> src/share/vm/prims/jni.cpp
>>      L4312:   // If the thread has already been detached the operations
>> is a no-op
>>          Typo: 'the operations is a' -> 'the operation is a'
>
> Fixed
>
>> src/share/vm/prims/jvmtiExport.cpp
>>      L377:       JavaThread* current_thread = JavaThread::current();
>>          Wrong indent; two less spaces
>
> Fixed.
>
>> src/share/vm/runtime/threadLocalStorage.hpp
>>      L37: // Platforms without compiler-based TLS (ie __thread
>> storage-class modifier)
>>          Typo: 'ie' -> 'i.e.'
>
> Fixed.
>
>> src/share/vm/runtime/vm_operations.cpp
>>
>>      (old) L396:   Thread * thr_cur =
>> ThreadLocalStorage::get_thread_slow();
>>      (new) L396:   Thread * thr_cur = Thread::current();
>>
>>          I think this should be 'current_or_null()' like L374.
>
> Actually L374 should just be Thread::current() as NULL is neither 
> possible nor expected. These VM_Exit routines can only be called by an 
> attached thread. E.g. set_vm_exited is only called from 
> Threads::destroy_vm(), while wait_for_threads_in_native_to_block is 
> only called on the VMThread.
>
>> src/os/posix/vm/threadLocalStorage_posix.cpp
>>      L38:     ThreadLocalStorage::set_thread((Thread*) p);
>>          Wrong indent; should be two spaces.
>
> Fixed.
>
>> src/os/windows/vm/threadLocalStorage_windows.cpp
>>      L54:   assert(current != 0 || GetLastError() == ERROR_SUCCESS, '
>>          'current != 0' -> 'current != NULL'
>>          jcheck will complain about the blank at the end of this line
>>
>>      L62:   assert(res, "TlsSetValue failed with error code: %lu",
>> GetLastError());
>>          jcheck will complain about the blank at the end of this line
>
> Fixed.

I missed flagging that this line:

     L55:          "TlsGetValue failed with error code: %lu", 
GetLastError());

also ends with a blank. Sorry about that.

>
> Generated new webrev:
>
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/

Re-reviewed by comparing the v7 and v8 patch files.

Thumbs up!

Dan

>
> which has been rebased to current hs-rt content.
>
> Thanks,
> David
>
>>
>> Dan
>>
>>
>>>
>>> David
>>>
>>> On 24/11/2015 11:17 AM, David Holmes wrote:
>>>> Updated webrev:
>>>>
>>>> cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>>>
>>>> Only change is in macroAssembler_x86.cpp
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 24/11/2015 7:42 AM, David Holmes wrote:
>>>>> On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>>>>> On 23/11/2015 13:27, David Holmes wrote:
>>>>>>> Hi Bertrand,
>>>>>>>
>>>>>>> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Overall looks good.
>>>>>>>> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>>>>>>>
>>>>>>>> One doubt, in case this has not been discussed before.
>>>>>>>>
>>>>>>>> I'm still catching up on x86_64 (after years of ARM assembly :-))
>>>>>>>> but it
>>>>>>>> seems that there are some stack alignment constraints on runtime
>>>>>>>> calls,
>>>>>>>> at least for some x86_64 ABIs.
>>>>>>>>
>>>>>>>> Some of the x86 MacroAssembler::get_thread implementations had
>>>>>>>> code to
>>>>>>>> align the stack before calling pthread_getspecific. See for 
>>>>>>>> instance
>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Sorry I'm not that familiar with the MacroAssembler - is it this 
>>>>>>> odd
>>>>>>> fragment:
>>>>>>>
>>>>>>> -   push(r10);
>>>>>>> -   // XXX
>>>>>>> -   mov(r10, rsp);
>>>>>>> -   andq(rsp, -16);
>>>>>>> -   push(r10);
>>>>>>>
>>>>>>> I'm not at all clear what that is doing - and if it somehow
>>>>>>> changes the
>>>>>>> alignment wouldn't something need to be fixed up when popping the
>>>>>>> previous values ??
>>>>>>>
>>>>>>> To be honest I'm not even sure what an "unaligned stack" means.
>>>>>>
>>>>>> On some platforms, SP may have to be aligned on a 16 bytes boundary
>>>>>> when
>>>>>> calling another method.
>>>>>
>>>>> Okay - that seems to be an x64 ABI requirement based on other code in
>>>>> MacroAssembler. The reason it is missing in the Solaris code is 
>>>>> because
>>>>> I already removed it in the earlier changes that were done on 
>>>>> Solaris:
>>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>>>>
>>>>> and nobody picked it up. :(
>>>>>
>>>>> That said, the absence of this code has not caused any problems so
>>>>> presumably we are normally aligned.
>>>>>
>>>>>> After having pushed what needed to be saved, the code above 
>>>>>> rounds SP
>>>>>> down and saves the old value. It will then also push r11, which
>>>>>> results
>>>>>> in the expected alignment.
>>>>>>
>>>>>> The conterpart, after the VM call, is the:
>>>>>>   pop(r11);
>>>>>>   pop(rsp);
>>>>>
>>>>> I had assumed this extra manipulation was related to pushing the arg
>>>>> for
>>>>> the call.
>>>>>
>>>>>>>> This alignment is no longer performed in the new (shared)
>>>>>>>> implementation
>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Now, Solaris was not performing the alignment and Windows has a
>>>>>>>> separate
>>>>>>>> path for x86_64. Did we really need the alignment for linux
>>>>>>>> x86_64 and
>>>>>>>> bsd_x86_64 ? Might it be needed for other ports ?
>>>>>>>>
>>>>>>>> IMHO, it might be safer to align the stack by default, knowing it
>>>>>>>> should
>>>>>>>> not be expensive since we call get_thread rarely for x86_64
>>>>>>>> (result is
>>>>>>>> cached in r15). I'll let you see whether it is worth adding an
>>>>>>>> ifdef so
>>>>>>>> as to explicitly deactivate the alignment on some platforms
>>>>>>>> (solaris_x86_64 ?)
>>>>>>>
>>>>>>> I don't have enough knowledge to even begin to comment on this. I
>>>>>>> will
>>>>>>> have to rely on our x86_64 macroassembler experts to explain the 
>>>>>>> old
>>>>>>> code and what the requirements are.
>>>>>>
>>>>>> OK. This means that the alignment was not removed purposefully.
>>>>>>
>>>>>> In that case, you must either keep the per platform code x86_64 
>>>>>> (since
>>>>>> it was not identical for all platforms) or use the safest version,
>>>>>> with
>>>>>> the additional
>>>>>>
>>>>>>     // XXX
>>>>>>     mov(r10, rsp);
>>>>>>     andq(rsp, -16);
>>>>>>     push(r10);
>>>>>>
>>>>>> before the push(r11) and with the
>>>>>>
>>>>>>     pop(rsp);
>>>>>>
>>>>>> after the pop(r11). It should work on all x86_64 platforms.
>>>>>
>>>>> Right, I will add this back to the code - and replace the
>>>>> meaningless //
>>>>> XXX with an actual comment!
>>>>>
>>>>> Many thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> FYI, there is another way to do the alignment, based on the fact 
>>>>>> that
>>>>>> we are at least aligned on 8 bytes (see
>>>>>> MacroAssembler::call_VM_leaf_base). I assume that this second
>>>>>> version is
>>>>>> more efficient (particularly thanks to specilative branches) but it
>>>>>> might be safer/simpler to continue using andq in get_thread.
>>>>>>
>>>>>> Bertrand.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Bertrand.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23/11/2015 08:03, David Holmes wrote:
>>>>>>>>> After all the preliminary discussions here are final proposed
>>>>>>>>> changes:
>>>>>>>>>
>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>>>
>>>>>>>>> Open webrev: 
>>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>>>>>>>>
>>>>>>>>> A simple (in principle) but wide-ranging change which should
>>>>>>>>> appeal to
>>>>>>>>> our Code Deletion Engineer's. We implement Thread::current()
>>>>>>>>> using a
>>>>>>>>> compiler/language-based thread-local variable eg:
>>>>>>>>>
>>>>>>>>>   static __thread Thread *_thr_current;
>>>>>>>>>
>>>>>>>>>   inline Thread* Thread::current() {
>>>>>>>>>     return _thr_current;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> with an appropriate setter of course. By doing this we can
>>>>>>>>> completely
>>>>>>>>> remove the os_cpu-specific ThreadLocalStorage implementations, 
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>> associated os::thread_local_storage* calls.
>>>>>>>>>
>>>>>>>>> As __thread is not async-signal-safe (either in theory or
>>>>>>>>> practice) we
>>>>>>>>> need a fallback library-based solution as used today. For this we
>>>>>>>>> use a
>>>>>>>>> very simple ThreadLocalStorage class and an implementation 
>>>>>>>>> thereof
>>>>>>>>> for
>>>>>>>>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>>>>>>>> pthread_get/setspecific; and one for Windows using its TLS 
>>>>>>>>> library.
>>>>>>>>> While these library routines are not guaranteed 
>>>>>>>>> async-signal-safe,
>>>>>>>>> they
>>>>>>>>> seem to be in practice and are what we have been using all along.
>>>>>>>>>
>>>>>>>>> We also allow for use of only the library-based TLS for platforms
>>>>>>>>> where
>>>>>>>>> compiler-based thread locals are not supported (as will be
>>>>>>>>> needed in
>>>>>>>>> the
>>>>>>>>> Mobile project). This is enabled at build time by defining
>>>>>>>>> USE_LIBRARY_BASED_TLS_ONLY.
>>>>>>>>>
>>>>>>>>> Thanks to Andrew Haley for providing the Aarch64 code; and for
>>>>>>>>> Thomas
>>>>>>>>> Stuefe for testing PPC and AIX.
>>>>>>>>>
>>>>>>>>> Testing:
>>>>>>>>>   - JPRT (core platforms)
>>>>>>>>>   - Jtreg tests (linux & solaris)
>>>>>>>>>   - vm.runtime (core platforms)
>>>>>>>>>
>>>>>>>>> Performance:
>>>>>>>>>   - still TBD - this is proving to be extremely difficult. If
>>>>>>>>> anyone
>>>>>>>>> has
>>>>>>>>> any simple to run microbenchmarks to suggest I'd give them a try
>>>>>>>>> as a
>>>>>>>>> sanity check. But I lack access to hardware for running serious
>>>>>>>>> benchmarking.
>>>>>>>>>
>>>>>>>>> Footprint:
>>>>>>>>> - varies by platform and the VM (server, client, minimal)
>>>>>>>>> - worst-case: ~1% increase for server VM and minimal VM
>>>>>>>>> - best-case:  0.4% decrease for client VM
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>



More information about the hotspot-dev mailing list