回复:[PATCH] JFR thread sampling issue
guangyu(greg) zhu
guangyu.zgy at alibaba-inc.com
Mon Dec 3 02:34:37 UTC 2018
It's nice the question is addressed. I ever met the same problem. My application has ~700 java threads, most of them run in native state, so there is little chance for threads in java state can be sampled in an interval. Then I tried a similar solution as Milan's - If the 5 attempts all failed, continue trying until one sample succeed. Then there is at least one successful java sample in each 10ms interval.
The solution looks good, but the overhead increased significantly. So I added an switch to enable/disable native samples, and usually disabled the native samples to leave more cpu for java samples.
Besides the solution above, there are also other workarounds. In most situation, we can run JFR in a longer time to increase java samples; we can also reduce the sampling interval to increase sampling rate as Erik mentioned.
Thanks,
Guangyu
------------------------------------------------------------------
发件人:Markus Gronlund <markus.gronlund at oracle.com>
发送时间:2018年11月30日(星期五) 22:23
收件人:Milan Mimica <milan.mimica at gmail.com>
抄 送:hotspot-jfr-dev <hotspot-jfr-dev at openjdk.java.net>
主 题:RE: [PATCH] JFR thread sampling issue
Hi Milan,
Thanks for pointing to this and providing a patch suggestion.
Overall it looks reasonable, but I need to take a closer look in regards to how longer sample sessions interplays with the thread smr list - we should ensure that an attempt does not wrap around the thread list.
Sorry for the delay on this issue - your overall suggestion will be incorporated in some form, but we need to be careful that it does not have hidden side effects. I will get back to you on this.
Thanks again
Markus
-----Original Message-----
From: Milan Mimica <milan.mimica at gmail.com>
Sent: den 19 november 2018 13:42
To: hotspot-jfr-dev at openjdk.java.net
Subject: [PATCH] JFR thread sampling issue
Hello
I was regularly using JFR to profile my service and thread sampling results have been very useful. Unfortunately, when I have switched to Java 9 and afterwards, it stopped working. The recording would catch too few samples for the results to be useful. My service has about 500 threads, most of them being on some blocked state.
With the release of Java 11, I have started digging into the JFR source code and I think I have found the problem. I think JfrThreadSampler::task_stacktrace is supposed to find at most 5 threads that are in state _thread_in_Java, or at most 1 thread that is in _thread_in_native, but instead it just picks next 5 (or 1) threads and then ignores them of they are not in right state. Without the insight into code change prior to JDK 11 I can just guess, but there are some clues that lead me to think that's how it was supposed to work. One of the clues is that sample_task.do_sample_thread returns a result that is otherwise unused.
I'm attaching a patch.
Tested test-jdk_jfr_sanity on fastdebug, and in my production.
I'm waiting for my Author status approval so I'll be able to create a proper changeset. I believe I also need a ticket for this.
--
Milan Mimica
More information about the hotspot-jfr-dev
mailing list