RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

David Holmes david.holmes at oracle.com
Wed Nov 25 22:55:24 UTC 2015


On 26/11/2015 12:29 AM, Daniel D. Daugherty wrote:
> On 11/24/15 10:07 PM, David Holmes wrote:
>> Hi Dan,
>>
>> Many thanks for the meticulous review!
>
> You're welcome.
>
>
>>
>> On 25/11/2015 11:34 AM, Daniel D. Daugherty wrote:
>>> On 11/23/15 7:19 PM, David Holmes wrote:
>>>> Sorry about that:
>>>>
>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>>
>>> Please double check the copyright updates before you push.
>>
>> Did you notice something missing? I use:
>>
>> hg status -man | xargs grep Copyright | grep -v 2015
>>
>> to find any copyright lines without 2015 present (it turns up the
>> non-Oracle copyrights in a few files).
>
> src/os_cpu/linux_aarch64/vm/thread_linux_aarch64.hpp
>
> L2  * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights
> reserved.

Thank you. I managed to miss that. I have now updated my grep command:

hg status -man | xargs grep Copyright | grep Oracle | grep -v 2015

Many thanks for the re-review. Now to address Thomas's suggestions.

David
-----


>>
>> Trimming to files with comments ...
>>
>>> src/os/windows/vm/os_windows.cpp
>>>      L5998:       os::os_exception_wrapper(
>>> (java_call_t)call_wrapper_dummy,
>>>          Wrong indent; should only be two spaces.
>>>          Please delete space after first '('.
>>>
>>>      L5999:                                 NULL, NULL, NULL, NULL);
>>>          Indent will have to change to match any fixes on previous line.
>>
>> Fixed.
>>
>>> src/os/windows/vm/os_windows.hpp
>>>      L127:   static inline void set_thread_ptr_offset( int offset ) {
>>>          Please delete space after '(' and before ')'.
>>
>> Fixed (was a copy of code from now deleted file)
>>
>>> src/share/vm/gc/parallel/gcTaskThread.cpp
>>>      No comments.
>>>
>>>      Note: blank first line in this file (not your change)
>>
>> Fixed.
>>
>>> src/share/vm/prims/jni.cpp
>>>      L4312:   // If the thread has already been detached the operations
>>> is a no-op
>>>          Typo: 'the operations is a' -> 'the operation is a'
>>
>> Fixed
>>
>>> src/share/vm/prims/jvmtiExport.cpp
>>>      L377:       JavaThread* current_thread = JavaThread::current();
>>>          Wrong indent; two less spaces
>>
>> Fixed.
>>
>>> src/share/vm/runtime/threadLocalStorage.hpp
>>>      L37: // Platforms without compiler-based TLS (ie __thread
>>> storage-class modifier)
>>>          Typo: 'ie' -> 'i.e.'
>>
>> Fixed.
>>
>>> src/share/vm/runtime/vm_operations.cpp
>>>
>>>      (old) L396:   Thread * thr_cur =
>>> ThreadLocalStorage::get_thread_slow();
>>>      (new) L396:   Thread * thr_cur = Thread::current();
>>>
>>>          I think this should be 'current_or_null()' like L374.
>>
>> Actually L374 should just be Thread::current() as NULL is neither
>> possible nor expected. These VM_Exit routines can only be called by an
>> attached thread. E.g. set_vm_exited is only called from
>> Threads::destroy_vm(), while wait_for_threads_in_native_to_block is
>> only called on the VMThread.
>>
>>> src/os/posix/vm/threadLocalStorage_posix.cpp
>>>      L38:     ThreadLocalStorage::set_thread((Thread*) p);
>>>          Wrong indent; should be two spaces.
>>
>> Fixed.
>>
>>> src/os/windows/vm/threadLocalStorage_windows.cpp
>>>      L54:   assert(current != 0 || GetLastError() == ERROR_SUCCESS, '
>>>          'current != 0' -> 'current != NULL'
>>>          jcheck will complain about the blank at the end of this line
>>>
>>>      L62:   assert(res, "TlsSetValue failed with error code: %lu",
>>> GetLastError());
>>>          jcheck will complain about the blank at the end of this line
>>
>> Fixed.
>
> I missed flagging that this line:
>
>      L55:          "TlsGetValue failed with error code: %lu",
> GetLastError());
>
> also ends with a blank. Sorry about that.
>
>>
>> Generated new webrev:
>>
>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/
>
> Re-reviewed by comparing the v7 and v8 patch files.
>
> Thumbs up!
>
> Dan
>
>>
>> which has been rebased to current hs-rt content.
>>
>> Thanks,
>> David
>>
>>>
>>> Dan
>>>
>>>
>>>>
>>>> David
>>>>
>>>> On 24/11/2015 11:17 AM, David Holmes wrote:
>>>>> Updated webrev:
>>>>>
>>>>> cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>>>>
>>>>> Only change is in macroAssembler_x86.cpp
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 24/11/2015 7:42 AM, David Holmes wrote:
>>>>>> On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>>>>>> On 23/11/2015 13:27, David Holmes wrote:
>>>>>>>> Hi Bertrand,
>>>>>>>>
>>>>>>>> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> Overall looks good.
>>>>>>>>> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>>>>>>>>
>>>>>>>>> One doubt, in case this has not been discussed before.
>>>>>>>>>
>>>>>>>>> I'm still catching up on x86_64 (after years of ARM assembly :-))
>>>>>>>>> but it
>>>>>>>>> seems that there are some stack alignment constraints on runtime
>>>>>>>>> calls,
>>>>>>>>> at least for some x86_64 ABIs.
>>>>>>>>>
>>>>>>>>> Some of the x86 MacroAssembler::get_thread implementations had
>>>>>>>>> code to
>>>>>>>>> align the stack before calling pthread_getspecific. See for
>>>>>>>>> instance
>>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sorry I'm not that familiar with the MacroAssembler - is it this
>>>>>>>> odd
>>>>>>>> fragment:
>>>>>>>>
>>>>>>>> -   push(r10);
>>>>>>>> -   // XXX
>>>>>>>> -   mov(r10, rsp);
>>>>>>>> -   andq(rsp, -16);
>>>>>>>> -   push(r10);
>>>>>>>>
>>>>>>>> I'm not at all clear what that is doing - and if it somehow
>>>>>>>> changes the
>>>>>>>> alignment wouldn't something need to be fixed up when popping the
>>>>>>>> previous values ??
>>>>>>>>
>>>>>>>> To be honest I'm not even sure what an "unaligned stack" means.
>>>>>>>
>>>>>>> On some platforms, SP may have to be aligned on a 16 bytes boundary
>>>>>>> when
>>>>>>> calling another method.
>>>>>>
>>>>>> Okay - that seems to be an x64 ABI requirement based on other code in
>>>>>> MacroAssembler. The reason it is missing in the Solaris code is
>>>>>> because
>>>>>> I already removed it in the earlier changes that were done on
>>>>>> Solaris:
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>>>>>
>>>>>> and nobody picked it up. :(
>>>>>>
>>>>>> That said, the absence of this code has not caused any problems so
>>>>>> presumably we are normally aligned.
>>>>>>
>>>>>>> After having pushed what needed to be saved, the code above
>>>>>>> rounds SP
>>>>>>> down and saves the old value. It will then also push r11, which
>>>>>>> results
>>>>>>> in the expected alignment.
>>>>>>>
>>>>>>> The conterpart, after the VM call, is the:
>>>>>>>   pop(r11);
>>>>>>>   pop(rsp);
>>>>>>
>>>>>> I had assumed this extra manipulation was related to pushing the arg
>>>>>> for
>>>>>> the call.
>>>>>>
>>>>>>>>> This alignment is no longer performed in the new (shared)
>>>>>>>>> implementation
>>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now, Solaris was not performing the alignment and Windows has a
>>>>>>>>> separate
>>>>>>>>> path for x86_64. Did we really need the alignment for linux
>>>>>>>>> x86_64 and
>>>>>>>>> bsd_x86_64 ? Might it be needed for other ports ?
>>>>>>>>>
>>>>>>>>> IMHO, it might be safer to align the stack by default, knowing it
>>>>>>>>> should
>>>>>>>>> not be expensive since we call get_thread rarely for x86_64
>>>>>>>>> (result is
>>>>>>>>> cached in r15). I'll let you see whether it is worth adding an
>>>>>>>>> ifdef so
>>>>>>>>> as to explicitly deactivate the alignment on some platforms
>>>>>>>>> (solaris_x86_64 ?)
>>>>>>>>
>>>>>>>> I don't have enough knowledge to even begin to comment on this. I
>>>>>>>> will
>>>>>>>> have to rely on our x86_64 macroassembler experts to explain the
>>>>>>>> old
>>>>>>>> code and what the requirements are.
>>>>>>>
>>>>>>> OK. This means that the alignment was not removed purposefully.
>>>>>>>
>>>>>>> In that case, you must either keep the per platform code x86_64
>>>>>>> (since
>>>>>>> it was not identical for all platforms) or use the safest version,
>>>>>>> with
>>>>>>> the additional
>>>>>>>
>>>>>>>     // XXX
>>>>>>>     mov(r10, rsp);
>>>>>>>     andq(rsp, -16);
>>>>>>>     push(r10);
>>>>>>>
>>>>>>> before the push(r11) and with the
>>>>>>>
>>>>>>>     pop(rsp);
>>>>>>>
>>>>>>> after the pop(r11). It should work on all x86_64 platforms.
>>>>>>
>>>>>> Right, I will add this back to the code - and replace the
>>>>>> meaningless //
>>>>>> XXX with an actual comment!
>>>>>>
>>>>>> Many thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> FYI, there is another way to do the alignment, based on the fact
>>>>>>> that
>>>>>>> we are at least aligned on 8 bytes (see
>>>>>>> MacroAssembler::call_VM_leaf_base). I assume that this second
>>>>>>> version is
>>>>>>> more efficient (particularly thanks to specilative branches) but it
>>>>>>> might be safer/simpler to continue using andq in get_thread.
>>>>>>>
>>>>>>> Bertrand.
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Bertrand.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 23/11/2015 08:03, David Holmes wrote:
>>>>>>>>>> After all the preliminary discussions here are final proposed
>>>>>>>>>> changes:
>>>>>>>>>>
>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>>>>
>>>>>>>>>> Open webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>>>>>>>>>
>>>>>>>>>> A simple (in principle) but wide-ranging change which should
>>>>>>>>>> appeal to
>>>>>>>>>> our Code Deletion Engineer's. We implement Thread::current()
>>>>>>>>>> using a
>>>>>>>>>> compiler/language-based thread-local variable eg:
>>>>>>>>>>
>>>>>>>>>>   static __thread Thread *_thr_current;
>>>>>>>>>>
>>>>>>>>>>   inline Thread* Thread::current() {
>>>>>>>>>>     return _thr_current;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> with an appropriate setter of course. By doing this we can
>>>>>>>>>> completely
>>>>>>>>>> remove the os_cpu-specific ThreadLocalStorage implementations,
>>>>>>>>>> and
>>>>>>>>>> the
>>>>>>>>>> associated os::thread_local_storage* calls.
>>>>>>>>>>
>>>>>>>>>> As __thread is not async-signal-safe (either in theory or
>>>>>>>>>> practice) we
>>>>>>>>>> need a fallback library-based solution as used today. For this we
>>>>>>>>>> use a
>>>>>>>>>> very simple ThreadLocalStorage class and an implementation
>>>>>>>>>> thereof
>>>>>>>>>> for
>>>>>>>>>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>>>>>>>>> pthread_get/setspecific; and one for Windows using its TLS
>>>>>>>>>> library.
>>>>>>>>>> While these library routines are not guaranteed
>>>>>>>>>> async-signal-safe,
>>>>>>>>>> they
>>>>>>>>>> seem to be in practice and are what we have been using all along.
>>>>>>>>>>
>>>>>>>>>> We also allow for use of only the library-based TLS for platforms
>>>>>>>>>> where
>>>>>>>>>> compiler-based thread locals are not supported (as will be
>>>>>>>>>> needed in
>>>>>>>>>> the
>>>>>>>>>> Mobile project). This is enabled at build time by defining
>>>>>>>>>> USE_LIBRARY_BASED_TLS_ONLY.
>>>>>>>>>>
>>>>>>>>>> Thanks to Andrew Haley for providing the Aarch64 code; and for
>>>>>>>>>> Thomas
>>>>>>>>>> Stuefe for testing PPC and AIX.
>>>>>>>>>>
>>>>>>>>>> Testing:
>>>>>>>>>>   - JPRT (core platforms)
>>>>>>>>>>   - Jtreg tests (linux & solaris)
>>>>>>>>>>   - vm.runtime (core platforms)
>>>>>>>>>>
>>>>>>>>>> Performance:
>>>>>>>>>>   - still TBD - this is proving to be extremely difficult. If
>>>>>>>>>> anyone
>>>>>>>>>> has
>>>>>>>>>> any simple to run microbenchmarks to suggest I'd give them a try
>>>>>>>>>> as a
>>>>>>>>>> sanity check. But I lack access to hardware for running serious
>>>>>>>>>> benchmarking.
>>>>>>>>>>
>>>>>>>>>> Footprint:
>>>>>>>>>> - varies by platform and the VM (server, client, minimal)
>>>>>>>>>> - worst-case: ~1% increase for server VM and minimal VM
>>>>>>>>>> - best-case:  0.4% decrease for client VM
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>
>


More information about the hotspot-dev mailing list