ThreadMXBean::getCurrentThreadAllocatedBytes
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Jul 13 20:52:52 UTC 2018
Markus,
I filed the following bug for you:
JDK-8207266 ThreadMXBean::getThreadAllocatedBytes() can be quicker
for self thread
https://bugs.openjdk.java.net/browse/JDK-8207266
Dan
On 7/13/18 4:46 PM, Daniel D. Daugherty wrote:
> On 7/13/18 2:44 PM, Daniel D. Daugherty wrote:
>> On 7/13/18 12:35 PM, Markus Gaisbauer wrote:
>>> Hello,
>>>
>>> I am trying to use ThreadMXBean::getThreadAllocatedBytes
>>> (com.sun.management) to get the amount of allocated memory of the
>>> current thread in some performance critical code.
>>>
>>> Unfortunately, the current implementation can be rather slow and the
>>> duration of each call unpredictable. I ran a test in a JVM with 500
>>> threads. Depending on which thread was queried,
>>> getThreadAllocatedBytes took between 100 ns and 2500 ns.
>>>
>>> The root cause of the problem is
>>> ThreadsList::find_JavaThread_from_java_tid which performs a linear
>>> scan through all Java threads in the current process. The more
>>> threads a JVM has, the slower it gets. In the worst case, the thread
>>> with the given TID is found as the last entry in the list.
>>>
>>> Before Java 10, the oldest thread is the slowest one to query.
>>> Since Java 10, the youngest thread is the slowest one to query. I
>>> think this was a side effect of introducing "Thread Safe Memory
>>> Reclamation (Thread-SMR) support".
>>>
>>> Oldest Thread Youngest Thread
>>> Java 8 8740 ns 76 ns
>>> Java 10 109 ns 2485 ns
>>
>> It is good to see that longest search is much faster. Erik and Robbin
>> will be pleased since speeding up traversal of the ThreadsList was one
>> of the things that we tried to do during the Thread-SMR project.
>>
>> A first step is get a new bug filed that documents the issue with
>> ThreadMXBean::getThreadAllocatedBytes(). Perhaps Gary or Serguei
>> will take care of that.
>>
>> Dan
>>
>>
>>> A common use case is to query the metric for the current thread
>>> (e.g. before and after performing some operation). This case can be
>>> optimized by introducing a new method: getCurrentThreadAllocatedBytes.
>>>
>>> I created a patch for http://hg.openjdk.java.net/jdk/jdk/ and by
>>> using the new method I saw the following improvements in my test:
>>> Oldest Thread Youngest Thread
>>> Proposal 37 ns 37 ns
>>>
>>> This is a 60x improvement over the worst case of the current API. In
>>> the best case of the current API, the new method is still 3 times
>>> faster.
>>>
>>> // based on JVM_SetNativeThreadName in jvm.cpp.
>>> JVM_ENTRY(jlong, jmm_GetCurrentThreadAllocatedMemory(JNIEnv *env,
>>> jobject currentThread))
>>> // We don't use a ThreadsListHandle here because the current thread
>>> // must be alive.
>>> oop java_thread = JNIHandles::resolve_non_null(currentThread);
>>> JavaThread* thr = java_lang_Thread::thread(java_thread);
>>> if (thread == thr) {
>>> // only supported for the current thread
>>> return thr->cooked_allocated_bytes();
>>> }
>>> return -1;
>>> JVM_END
>>>
>>> The proposed method also fixes the problem, that
>>> getThreadAllocatedBytes itself allocates some memory on the current
>>> thread (two long arrays, 24 bytes) and therefore can slightly skew
>>> measurements. The new method, getCurrentThreadAllocatedBytes,
>>> returns exactly the same value if it is called twice without
>>> allocating any memory between those calls.
>>>
>>> I also built a variation of this method that could be used to query
>>> allocated memory more efficiently for anyone who already has a
>>> java.lang.Thread object:
>>>
>>> JVM_ENTRY(jlong, jmm_GetThreadAllocatedMemory(JNIEnv *env, jobject
>>> threadObj))
>>> // based on code proposedin threadSMR.hpp
>>> ThreadsListHandle tlh;
>>> JavaThread* thr = NULL;
>>> bool is_alive = tlh.cv_internal_thread_to_JavaThread(threadObj,
>>> &thr, NULL);
>>> if (is_alive) {
>>> return thr->cooked_allocated_bytes();
>>> }
>>> return -1;
>>> JVM_END
>>>
>>> This method took 70 ns in my test, which is 85% slower
>>> than GetCurrentThreadAllocatedMemory but still 30% faster than the
>>> best case of the current API. I currently have no immediate need for
>>> this second method, but I think it would also be a valueable
>>> addition to the API.
>>>
>>> I attached a patch for getCurrentThreadAllocatedBytes. I can create
>>> a second patch for also adding
>>> getThreadAllocatedMemory(java.lang.Thread) to the API.
>>>
>>> I am a first time contributor and I am not 100% sure what process I
>>> must follow to get a change like this into OpenJDK. Can someone have
>>> a look at my proposal and help me through the process?
>>>
>>> Best regards,
>>> Markus
>>>
>>
>
> I believe this is the code that's causing you grief:
>
> open/src/hotspot/share/services/management.cpp:
>
> // Gets an array containing the amount of memory allocated on the Java
> // heap for a set of threads (in bytes). Each element of the array is
> // the amount of memory allocated for the thread ID specified in the
> // corresponding entry in the given array of thread IDs; or -1 if the
> // thread does not exist or has terminated.
> JVM_ENTRY(void, jmm_GetThreadAllocatedMemory(JNIEnv *env, jlongArray ids,
> jlongArray sizeArray))
> // Check if threads is null
> if (ids == NULL || sizeArray == NULL) {
> THROW(vmSymbols::java_lang_NullPointerException());
> }
>
> ResourceMark rm(THREAD);
> typeArrayOop ta = typeArrayOop(JNIHandles::resolve_non_null(ids));
> typeArrayHandle ids_ah(THREAD, ta);
>
> typeArrayOop sa = typeArrayOop(JNIHandles::resolve_non_null(sizeArray));
> typeArrayHandle sizeArray_h(THREAD, sa);
>
> // validate the thread id array
> validate_thread_id_array(ids_ah, CHECK);
>
> // sizeArray must be of the same length as the given array of thread IDs
> int num_threads = ids_ah->length();
> if (num_threads != sizeArray_h->length()) {
> THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(),
> "The length of the given long array does not match the
> length of "
> "the given array of thread IDs");
> }
>
> ThreadsListHandle tlh;
> for (int i = 0; i < num_threads; i++) {
> JavaThread* java_thread =
> tlh.list()->find_JavaThread_from_java_tid(ids_ah->long_at(i));
> if (java_thread != NULL) {
> sizeArray_h->long_at_put(i, java_thread->cooked_allocated_bytes());
> }
> }
> JVM_END
>
>
> Perhaps something like this above the "ThreadsListHandle tlh;" line:
>
> if (num_threads == 1 && THREAD->is_Java_thread()) {
> // Only asking for 1 thread so if we're a JavaThread, then
> // see if this request is for ourself.
> JavaThread* jt = THREAD;
> oop tobj = jt->threadObj();
>
> if (ids_ah->long_at(0) == java_lang_Thread::thread_id(tobj)) {
> // Return the info for ourself.
> sizeArray_h->long_at_put(0, jt->cooked_allocated_bytes());
> return;
> }
> }
>
> I haven't checked to see if this will even compile, but I
> think you'll get the idea.
>
> Dan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20180713/1f435423/attachment.html>
More information about the serviceability-dev
mailing list