Markus Gaisbauer markus.gaisbauer at
Fri Jul 13 16:35:21 UTC 2018


I am trying to use ThreadMXBean::getThreadAllocatedBytes
( to get the amount of allocated memory of the current
thread in some performance critical code.

Unfortunately, the current implementation can be rather slow and the
duration of each call unpredictable. I ran a test in a JVM with 500
threads. Depending on which thread was queried, getThreadAllocatedBytes
took between 100 ns and 2500 ns.

The root cause of the problem is ThreadsList::find_JavaThread_from_java_tid
which performs a linear scan through all Java threads in the current
process. The more threads a JVM has, the slower it gets. In the worst case,
the thread with the given TID is found as the last entry in the list.

Before Java 10, the oldest thread is the slowest one to query.
Since Java 10, the youngest thread is the slowest one to query. I think
this was a side effect of introducing "Thread Safe Memory Reclamation
(Thread-SMR) support".

             Oldest Thread   Youngest Thread
Java 8             8740 ns             76 ns
Java 10             109 ns           2485 ns

A common use case is to query the metric for the current thread (e.g.
before and after performing some operation). This case can be optimized by
introducing a new method: getCurrentThreadAllocatedBytes.

I created a patch for and by using the
new method I saw the following improvements in my test:

             Oldest Thread   Youngest Thread
Proposal             37 ns             37 ns

This is a 60x improvement over the worst case of the current API. In the
best case of the current API, the new method is still 3 times faster.

// based on JVM_SetNativeThreadName in jvm.cpp.
JVM_ENTRY(jlong, jmm_GetCurrentThreadAllocatedMemory(JNIEnv *env, jobject
  // We don't use a ThreadsListHandle here because the current thread
  // must be alive.
  oop java_thread = JNIHandles::resolve_non_null(currentThread);
  JavaThread* thr = java_lang_Thread::thread(java_thread);
  if (thread == thr) {
    // only supported for the current thread
    return thr->cooked_allocated_bytes();
  return -1;

The proposed method also fixes the problem, that getThreadAllocatedBytes
itself allocates some memory on the current thread (two long arrays, 24
bytes) and therefore can slightly skew measurements. The new
method, getCurrentThreadAllocatedBytes, returns exactly the same value if
it is called twice without allocating any memory between those calls.

I also built a variation of this method that could be used to query
allocated memory more efficiently for anyone who already has a
java.lang.Thread object:

JVM_ENTRY(jlong, jmm_GetThreadAllocatedMemory(JNIEnv *env, jobject
  // based on code proposed in threadSMR.hpp
  ThreadsListHandle tlh;
  JavaThread* thr = NULL;
  bool is_alive = tlh.cv_internal_thread_to_JavaThread(threadObj, &thr,
  if (is_alive) {
    return thr->cooked_allocated_bytes();
  return -1;

This method took 70 ns in my test, which is 85% slower
than GetCurrentThreadAllocatedMemory but still 30% faster than the best
case of the current API. I currently have no immediate need for this second
method, but I think it would also be a valueable addition to the API.

I attached a patch for getCurrentThreadAllocatedBytes. I can create a
second patch for also adding getThreadAllocatedMemory(java.lang.Thread) to
the API.

I am a first time contributor and I am not 100% sure what process I must
follow to get a change like this into OpenJDK. Can someone have a look at
my proposal and help me through the process?

Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: getCurrentThreadAllocatedBytes.diff
Type: application/octet-stream
Size: 5058 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 3119 bytes
Desc: not available
URL: <>

More information about the serviceability-dev mailing list