[aarch64-port-dev ] Use TLS for Thread::current()
Andrew Haley
aph at redhat.com
Thu Jul 31 13:53:42 UTC 2014
Here's a nice piece of low-hanging fruit.
I did some profiling and discovered that more than 1-2% of the runtime
is spent in ThreadLocalStorage::thread() ! We were using
pthread_getspecific(), which is not an efficient mechanism. It
doesn't matter much for Java, which keeps the current Thread in a
register, but it matters a lot for the C++ runtime, which uses
ThreadLocalStorage::thread() in 1667 different places.
Linux's TLS (thread-local storage) mechanism keeps a pointer to a TLS
thread block in system register tpidr_el0, so TLS access looks like
this, first calculating the offset from the static TLS block, and then
adding it to the thread pointer:
adrp x0, 0x7fb7d4f000
ldr x1, [x0,#3312]
add x0, x0, #0xcf0
blr x1
ldr x0, [x0,#8]
ret
mrs x1, tpidr_el0
ldr x20, [x1,x0]
which looks pretty good. I don't think that we could do very much
better than this with a custom mechanism.
Andrew.
changeset: 7204:5b248d10f0ae
tag: tip
user: aph
date: Thu Jul 31 04:53:53 2014 -0400
summary: Use TLS for ThreadLocalStorage::thread()
diff -r 5e238903a875 -r 5b248d10f0ae src/os_cpu/linux_aarch64/vm/globals_linux_aarch64.hpp
--- a/src/os_cpu/linux_aarch64/vm/globals_linux_aarch64.hpp Tue Jul 29 06:00:26 2014 -0400
+++ b/src/os_cpu/linux_aarch64/vm/globals_linux_aarch64.hpp Thu Jul 31 04:53:53 2014 -0400
@@ -39,4 +39,6 @@
// Used on 64 bit platforms for UseCompressedOops base address
define_pd_global(uintx,HeapBaseMinAddress, 2*G);
+extern __thread Thread *aarch64_currentThread;
+
#endif // OS_CPU_LINUX_AARCH64_VM_GLOBALS_LINUX_AARCH64_HPP
diff -r 5e238903a875 -r 5b248d10f0ae src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.cpp
--- a/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.cpp Tue Jul 29 06:00:26 2014 -0400
+++ b/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.cpp Thu Jul 31 04:53:53 2014 -0400
@@ -26,32 +26,6 @@
#include "runtime/threadLocalStorage.hpp"
#include "runtime/thread.inline.hpp"
-// Map stack pointer (%esp) to thread pointer for faster TLS access
-//
-// Here we use a flat table for better performance. Getting current thread
-// is down to one memory access (read _sp_map[%esp>>12]) in generated code
-// and two in runtime code (-fPIC code needs an extra load for _sp_map).
-//
-// This code assumes stack page is not shared by different threads. It works
-// in 32-bit VM when page size is 4K (or a multiple of 4K, if that matters).
-//
-// Notice that _sp_map is allocated in the bss segment, which is ZFOD
-// (zero-fill-on-demand). While it reserves 4M address space upfront,
-// actual memory pages are committed on demand.
-//
-// If an application creates and destroys a lot of threads, usually the
-// stack space freed by a thread will soon get reused by new thread
-// (this is especially true in NPTL or LinuxThreads in fixed-stack mode).
-// No memory page in _sp_map is wasted.
-//
-// However, it's still possible that we might end up populating &
-// committing a large fraction of the 4M table over time, but the actual
-// amount of live data in the table could be quite small. The max wastage
-// is less than 4M bytes. If it becomes an issue, we could use madvise()
-// with MADV_DONTNEED to reclaim unused (i.e. all-zero) pages in _sp_map.
-// MADV_DONTNEED on Linux keeps the virtual memory mapping, but zaps the
-// physical memory page (i.e. similar to MADV_FREE on Solaris).
-
void ThreadLocalStorage::generate_code_for_get_thread() {
// nothing we can do here for user-level thread
}
@@ -59,6 +33,9 @@
void ThreadLocalStorage::pd_init() {
}
+__thread Thread *aarch64_currentThread;
+
void ThreadLocalStorage::pd_set_thread(Thread* thread) {
os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread);
+ aarch64_currentThread = thread;
}
diff -r 5e238903a875 -r 5b248d10f0ae src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.hpp
--- a/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.hpp Tue Jul 29 06:00:26 2014 -0400
+++ b/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.hpp Thu Jul 31 04:53:53 2014 -0400
@@ -29,8 +29,8 @@
public:
- static Thread* thread() {
- return (Thread*) os::thread_local_storage_at(thread_index());
- }
+ static Thread *thread() {
+ return aarch64_currentThread;
+ }
#endif // OS_CPU_LINUX_AARCH64_VM_THREADLS_LINUX_AARCH64_HPP
More information about the aarch64-port-dev
mailing list