[aarch64-port-dev ] Use TLS for Thread::current()

Andrew Haley aph at redhat.com
Thu Jul 31 13:53:42 UTC 2014


Here's a nice piece of low-hanging fruit.

I did some profiling and discovered that more than 1-2% of the runtime
is spent in ThreadLocalStorage::thread() !  We were using
pthread_getspecific(), which is not an efficient mechanism.  It
doesn't matter much for Java, which keeps the current Thread in a
register, but it matters a lot for the C++ runtime, which uses
ThreadLocalStorage::thread() in 1667 different places.

Linux's TLS (thread-local storage) mechanism keeps a pointer to a TLS
thread block in system register tpidr_el0, so TLS access looks like
this, first calculating the offset from the static TLS block, and then
adding it to the thread pointer:

       adrp    x0, 0x7fb7d4f000
       ldr     x1, [x0,#3312]
       add     x0, x0, #0xcf0
       blr     x1
           ldr     x0, [x0,#8]
           ret
       mrs     x1, tpidr_el0
       ldr     x20, [x1,x0]

which looks pretty good.  I don't think that we could do very much
better than this with a custom mechanism.

Andrew.


changeset:   7204:5b248d10f0ae
tag:         tip
user:        aph
date:        Thu Jul 31 04:53:53 2014 -0400
summary:     Use TLS for ThreadLocalStorage::thread()

diff -r 5e238903a875 -r 5b248d10f0ae src/os_cpu/linux_aarch64/vm/globals_linux_aarch64.hpp
--- a/src/os_cpu/linux_aarch64/vm/globals_linux_aarch64.hpp     Tue Jul 29 06:00:26 2014 -0400
+++ b/src/os_cpu/linux_aarch64/vm/globals_linux_aarch64.hpp     Thu Jul 31 04:53:53 2014 -0400
@@ -39,4 +39,6 @@
 // Used on 64 bit platforms for UseCompressedOops base address
 define_pd_global(uintx,HeapBaseMinAddress,       2*G);

+extern __thread Thread *aarch64_currentThread;
+
 #endif // OS_CPU_LINUX_AARCH64_VM_GLOBALS_LINUX_AARCH64_HPP
diff -r 5e238903a875 -r 5b248d10f0ae src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.cpp
--- a/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.cpp    Tue Jul 29 06:00:26 2014 -0400
+++ b/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.cpp    Thu Jul 31 04:53:53 2014 -0400
@@ -26,32 +26,6 @@
 #include "runtime/threadLocalStorage.hpp"
 #include "runtime/thread.inline.hpp"

-// Map stack pointer (%esp) to thread pointer for faster TLS access
-//
-// Here we use a flat table for better performance. Getting current thread
-// is down to one memory access (read _sp_map[%esp>>12]) in generated code
-// and two in runtime code (-fPIC code needs an extra load for _sp_map).
-//
-// This code assumes stack page is not shared by different threads. It works
-// in 32-bit VM when page size is 4K (or a multiple of 4K, if that matters).
-//
-// Notice that _sp_map is allocated in the bss segment, which is ZFOD
-// (zero-fill-on-demand). While it reserves 4M address space upfront,
-// actual memory pages are committed on demand.
-//
-// If an application creates and destroys a lot of threads, usually the
-// stack space freed by a thread will soon get reused by new thread
-// (this is especially true in NPTL or LinuxThreads in fixed-stack mode).
-// No memory page in _sp_map is wasted.
-//
-// However, it's still possible that we might end up populating &
-// committing a large fraction of the 4M table over time, but the actual
-// amount of live data in the table could be quite small. The max wastage
-// is less than 4M bytes. If it becomes an issue, we could use madvise()
-// with MADV_DONTNEED to reclaim unused (i.e. all-zero) pages in _sp_map.
-// MADV_DONTNEED on Linux keeps the virtual memory mapping, but zaps the
-// physical memory page (i.e. similar to MADV_FREE on Solaris).
-
 void ThreadLocalStorage::generate_code_for_get_thread() {
     // nothing we can do here for user-level thread
 }
@@ -59,6 +33,9 @@
 void ThreadLocalStorage::pd_init() {
 }

+__thread Thread *aarch64_currentThread;
+
 void ThreadLocalStorage::pd_set_thread(Thread* thread) {
   os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread);
+  aarch64_currentThread = thread;
 }
diff -r 5e238903a875 -r 5b248d10f0ae src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.hpp
--- a/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.hpp    Tue Jul 29 06:00:26 2014 -0400
+++ b/src/os_cpu/linux_aarch64/vm/threadLS_linux_aarch64.hpp    Thu Jul 31 04:53:53 2014 -0400
@@ -29,8 +29,8 @@

 public:

-   static Thread* thread() {
-     return (Thread*) os::thread_local_storage_at(thread_index());
-   }
+  static Thread *thread() {
+    return aarch64_currentThread;
+  }

 #endif // OS_CPU_LINUX_AARCH64_VM_THREADLS_LINUX_AARCH64_HPP



More information about the aarch64-port-dev mailing list