RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Nov 25 14:29:46 UTC 2015
On 11/24/15 10:07 PM, David Holmes wrote:
> Hi Dan,
>
> Many thanks for the meticulous review!
You're welcome.
>
> On 25/11/2015 11:34 AM, Daniel D. Daugherty wrote:
>> On 11/23/15 7:19 PM, David Holmes wrote:
>>> Sorry about that:
>>>
>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>
>> Please double check the copyright updates before you push.
>
> Did you notice something missing? I use:
>
> hg status -man | xargs grep Copyright | grep -v 2015
>
> to find any copyright lines without 2015 present (it turns up the
> non-Oracle copyrights in a few files).
src/os_cpu/linux_aarch64/vm/thread_linux_aarch64.hpp
L2 * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights
reserved.
>
> Trimming to files with comments ...
>
>> src/os/windows/vm/os_windows.cpp
>> L5998: os::os_exception_wrapper(
>> (java_call_t)call_wrapper_dummy,
>> Wrong indent; should only be two spaces.
>> Please delete space after first '('.
>>
>> L5999: NULL, NULL, NULL, NULL);
>> Indent will have to change to match any fixes on previous line.
>
> Fixed.
>
>> src/os/windows/vm/os_windows.hpp
>> L127: static inline void set_thread_ptr_offset( int offset ) {
>> Please delete space after '(' and before ')'.
>
> Fixed (was a copy of code from now deleted file)
>
>> src/share/vm/gc/parallel/gcTaskThread.cpp
>> No comments.
>>
>> Note: blank first line in this file (not your change)
>
> Fixed.
>
>> src/share/vm/prims/jni.cpp
>> L4312: // If the thread has already been detached the operations
>> is a no-op
>> Typo: 'the operations is a' -> 'the operation is a'
>
> Fixed
>
>> src/share/vm/prims/jvmtiExport.cpp
>> L377: JavaThread* current_thread = JavaThread::current();
>> Wrong indent; two less spaces
>
> Fixed.
>
>> src/share/vm/runtime/threadLocalStorage.hpp
>> L37: // Platforms without compiler-based TLS (ie __thread
>> storage-class modifier)
>> Typo: 'ie' -> 'i.e.'
>
> Fixed.
>
>> src/share/vm/runtime/vm_operations.cpp
>>
>> (old) L396: Thread * thr_cur =
>> ThreadLocalStorage::get_thread_slow();
>> (new) L396: Thread * thr_cur = Thread::current();
>>
>> I think this should be 'current_or_null()' like L374.
>
> Actually L374 should just be Thread::current() as NULL is neither
> possible nor expected. These VM_Exit routines can only be called by an
> attached thread. E.g. set_vm_exited is only called from
> Threads::destroy_vm(), while wait_for_threads_in_native_to_block is
> only called on the VMThread.
>
>> src/os/posix/vm/threadLocalStorage_posix.cpp
>> L38: ThreadLocalStorage::set_thread((Thread*) p);
>> Wrong indent; should be two spaces.
>
> Fixed.
>
>> src/os/windows/vm/threadLocalStorage_windows.cpp
>> L54: assert(current != 0 || GetLastError() == ERROR_SUCCESS, '
>> 'current != 0' -> 'current != NULL'
>> jcheck will complain about the blank at the end of this line
>>
>> L62: assert(res, "TlsSetValue failed with error code: %lu",
>> GetLastError());
>> jcheck will complain about the blank at the end of this line
>
> Fixed.
I missed flagging that this line:
L55: "TlsGetValue failed with error code: %lu",
GetLastError());
also ends with a blank. Sorry about that.
>
> Generated new webrev:
>
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v8/
Re-reviewed by comparing the v7 and v8 patch files.
Thumbs up!
Dan
>
> which has been rebased to current hs-rt content.
>
> Thanks,
> David
>
>>
>> Dan
>>
>>
>>>
>>> David
>>>
>>> On 24/11/2015 11:17 AM, David Holmes wrote:
>>>> Updated webrev:
>>>>
>>>> cr.openjdk.java.net/~dholmes/8132510/webrev.v7/
>>>>
>>>> Only change is in macroAssembler_x86.cpp
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 24/11/2015 7:42 AM, David Holmes wrote:
>>>>> On 23/11/2015 11:46 PM, Bertrand Delsart wrote:
>>>>>> On 23/11/2015 13:27, David Holmes wrote:
>>>>>>> Hi Bertrand,
>>>>>>>
>>>>>>> On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Overall looks good.
>>>>>>>> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
>>>>>>>>
>>>>>>>> One doubt, in case this has not been discussed before.
>>>>>>>>
>>>>>>>> I'm still catching up on x86_64 (after years of ARM assembly :-))
>>>>>>>> but it
>>>>>>>> seems that there are some stack alignment constraints on runtime
>>>>>>>> calls,
>>>>>>>> at least for some x86_64 ABIs.
>>>>>>>>
>>>>>>>> Some of the x86 MacroAssembler::get_thread implementations had
>>>>>>>> code to
>>>>>>>> align the stack before calling pthread_getspecific. See for
>>>>>>>> instance
>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Sorry I'm not that familiar with the MacroAssembler - is it this
>>>>>>> odd
>>>>>>> fragment:
>>>>>>>
>>>>>>> - push(r10);
>>>>>>> - // XXX
>>>>>>> - mov(r10, rsp);
>>>>>>> - andq(rsp, -16);
>>>>>>> - push(r10);
>>>>>>>
>>>>>>> I'm not at all clear what that is doing - and if it somehow
>>>>>>> changes the
>>>>>>> alignment wouldn't something need to be fixed up when popping the
>>>>>>> previous values ??
>>>>>>>
>>>>>>> To be honest I'm not even sure what an "unaligned stack" means.
>>>>>>
>>>>>> On some platforms, SP may have to be aligned on a 16 bytes boundary
>>>>>> when
>>>>>> calling another method.
>>>>>
>>>>> Okay - that seems to be an x64 ABI requirement based on other code in
>>>>> MacroAssembler. The reason it is missing in the Solaris code is
>>>>> because
>>>>> I already removed it in the earlier changes that were done on
>>>>> Solaris:
>>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d5b328043c10
>>>>>
>>>>> and nobody picked it up. :(
>>>>>
>>>>> That said, the absence of this code has not caused any problems so
>>>>> presumably we are normally aligned.
>>>>>
>>>>>> After having pushed what needed to be saved, the code above
>>>>>> rounds SP
>>>>>> down and saves the old value. It will then also push r11, which
>>>>>> results
>>>>>> in the expected alignment.
>>>>>>
>>>>>> The conterpart, after the VM call, is the:
>>>>>> pop(r11);
>>>>>> pop(rsp);
>>>>>
>>>>> I had assumed this extra manipulation was related to pushing the arg
>>>>> for
>>>>> the call.
>>>>>
>>>>>>>> This alignment is no longer performed in the new (shared)
>>>>>>>> implementation
>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Now, Solaris was not performing the alignment and Windows has a
>>>>>>>> separate
>>>>>>>> path for x86_64. Did we really need the alignment for linux
>>>>>>>> x86_64 and
>>>>>>>> bsd_x86_64 ? Might it be needed for other ports ?
>>>>>>>>
>>>>>>>> IMHO, it might be safer to align the stack by default, knowing it
>>>>>>>> should
>>>>>>>> not be expensive since we call get_thread rarely for x86_64
>>>>>>>> (result is
>>>>>>>> cached in r15). I'll let you see whether it is worth adding an
>>>>>>>> ifdef so
>>>>>>>> as to explicitly deactivate the alignment on some platforms
>>>>>>>> (solaris_x86_64 ?)
>>>>>>>
>>>>>>> I don't have enough knowledge to even begin to comment on this. I
>>>>>>> will
>>>>>>> have to rely on our x86_64 macroassembler experts to explain the
>>>>>>> old
>>>>>>> code and what the requirements are.
>>>>>>
>>>>>> OK. This means that the alignment was not removed purposefully.
>>>>>>
>>>>>> In that case, you must either keep the per platform code x86_64
>>>>>> (since
>>>>>> it was not identical for all platforms) or use the safest version,
>>>>>> with
>>>>>> the additional
>>>>>>
>>>>>> // XXX
>>>>>> mov(r10, rsp);
>>>>>> andq(rsp, -16);
>>>>>> push(r10);
>>>>>>
>>>>>> before the push(r11) and with the
>>>>>>
>>>>>> pop(rsp);
>>>>>>
>>>>>> after the pop(r11). It should work on all x86_64 platforms.
>>>>>
>>>>> Right, I will add this back to the code - and replace the
>>>>> meaningless //
>>>>> XXX with an actual comment!
>>>>>
>>>>> Many thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> FYI, there is another way to do the alignment, based on the fact
>>>>>> that
>>>>>> we are at least aligned on 8 bytes (see
>>>>>> MacroAssembler::call_VM_leaf_base). I assume that this second
>>>>>> version is
>>>>>> more efficient (particularly thanks to specilative branches) but it
>>>>>> might be safer/simpler to continue using andq in get_thread.
>>>>>>
>>>>>> Bertrand.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Bertrand.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23/11/2015 08:03, David Holmes wrote:
>>>>>>>>> After all the preliminary discussions here are final proposed
>>>>>>>>> changes:
>>>>>>>>>
>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>>>
>>>>>>>>> Open webrev:
>>>>>>>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>>>>>>>>>
>>>>>>>>> A simple (in principle) but wide-ranging change which should
>>>>>>>>> appeal to
>>>>>>>>> our Code Deletion Engineer's. We implement Thread::current()
>>>>>>>>> using a
>>>>>>>>> compiler/language-based thread-local variable eg:
>>>>>>>>>
>>>>>>>>> static __thread Thread *_thr_current;
>>>>>>>>>
>>>>>>>>> inline Thread* Thread::current() {
>>>>>>>>> return _thr_current;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> with an appropriate setter of course. By doing this we can
>>>>>>>>> completely
>>>>>>>>> remove the os_cpu-specific ThreadLocalStorage implementations,
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>> associated os::thread_local_storage* calls.
>>>>>>>>>
>>>>>>>>> As __thread is not async-signal-safe (either in theory or
>>>>>>>>> practice) we
>>>>>>>>> need a fallback library-based solution as used today. For this we
>>>>>>>>> use a
>>>>>>>>> very simple ThreadLocalStorage class and an implementation
>>>>>>>>> thereof
>>>>>>>>> for
>>>>>>>>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>>>>>>>>> pthread_get/setspecific; and one for Windows using its TLS
>>>>>>>>> library.
>>>>>>>>> While these library routines are not guaranteed
>>>>>>>>> async-signal-safe,
>>>>>>>>> they
>>>>>>>>> seem to be in practice and are what we have been using all along.
>>>>>>>>>
>>>>>>>>> We also allow for use of only the library-based TLS for platforms
>>>>>>>>> where
>>>>>>>>> compiler-based thread locals are not supported (as will be
>>>>>>>>> needed in
>>>>>>>>> the
>>>>>>>>> Mobile project). This is enabled at build time by defining
>>>>>>>>> USE_LIBRARY_BASED_TLS_ONLY.
>>>>>>>>>
>>>>>>>>> Thanks to Andrew Haley for providing the Aarch64 code; and for
>>>>>>>>> Thomas
>>>>>>>>> Stuefe for testing PPC and AIX.
>>>>>>>>>
>>>>>>>>> Testing:
>>>>>>>>> - JPRT (core platforms)
>>>>>>>>> - Jtreg tests (linux & solaris)
>>>>>>>>> - vm.runtime (core platforms)
>>>>>>>>>
>>>>>>>>> Performance:
>>>>>>>>> - still TBD - this is proving to be extremely difficult. If
>>>>>>>>> anyone
>>>>>>>>> has
>>>>>>>>> any simple to run microbenchmarks to suggest I'd give them a try
>>>>>>>>> as a
>>>>>>>>> sanity check. But I lack access to hardware for running serious
>>>>>>>>> benchmarking.
>>>>>>>>>
>>>>>>>>> Footprint:
>>>>>>>>> - varies by platform and the VM (server, client, minimal)
>>>>>>>>> - worst-case: ~1% increase for server VM and minimal VM
>>>>>>>>> - best-case: 0.4% decrease for client VM
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>
More information about the hotspot-dev
mailing list