[PATCH] JFR thread sampling issue

Markus Gronlund markus.gronlund at oracle.com
Thu Dec 20 17:45:14 UTC 2018


Hi again Milan,

 

Sorry for the delay on this. After some investigation I think your patch is good as-is.

 

I was concerned with the loop not having a terminating condition when the number of samples is less than sample_limit.

 

Re-checking on next_thread(…) however, it actually provides this condition as part of:

…

return next != first_sampled ? next : NULL;

 

With this will restore the previous sampling behavior for Java threads that is part of 8 and early 9 – thank you.

I will shepherd and sponsor this change for you.

 

Thanks again

Markus

 

From: Milan Mimica <milan.mimica at gmail.com> 
Sent: den 20 december 2018 12:51
To: Markus Gronlund <markus.gronlund at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net
Subject: Re: [PATCH] JFR thread sampling issue

 

Hi Markus

 

I'm gonna try to resurrect thins (once again).

 

Did you get a chance to check in more details? At a glance, the code takes care not to wrap around the thread list.

 

 

 

On Fri, 30 Nov 2018 at 15:21, Markus Gronlund <HYPERLINK "mailto:markus.gronlund at oracle.com"markus.gronlund at oracle.com> wrote:

Hi Milan,

Thanks for pointing to this and providing a patch suggestion.

Overall it looks reasonable, but I need to take a closer look in regards to how longer sample sessions interplays with the thread smr list - we should ensure that an attempt does not wrap around the thread list.

Sorry for the delay on this issue - your overall suggestion will be incorporated in some form, but we need to be careful that it does not have hidden side effects. I will get back to you on this.

Thanks again
Markus 

-----Original Message-----
From: Milan Mimica <HYPERLINK "mailto:milan.mimica at gmail.com"milan.mimica at gmail.com> 
Sent: den 19 november 2018 13:42
To: HYPERLINK "mailto:hotspot-jfr-dev at openjdk.java.net"hotspot-jfr-dev at openjdk.java.net
Subject: [PATCH] JFR thread sampling issue

Hello

I was regularly using JFR to profile my service and thread sampling results have been very useful. Unfortunately, when I have switched to Java 9 and afterwards, it stopped working. The recording would catch too few samples for the results to be useful. My service has about 500 threads, most of them being on some blocked state.

With the release of Java 11, I have started digging into the JFR source code and I think I have found the problem. I think JfrThreadSampler::task_stacktrace is supposed to find at most 5 threads that are in state _thread_in_Java, or at most 1 thread that is in _thread_in_native, but instead it just picks next 5 (or 1) threads and then ignores them of they are not in right state. Without the insight into code change prior to JDK 11 I can just guess, but there are some clues that lead me to think that's how it was supposed to work. One of the clues is that sample_task.do_sample_thread returns a result that is otherwise unused.

I'm attaching a patch.
Tested test-jdk_jfr_sanity on fastdebug, and in my production.

I'm waiting for my Author status approval so I'll be able to create a proper changeset. I believe I also need a ticket for this.


--
Milan Mimica



-- 

Milan Mimica


More information about the hotspot-jfr-dev mailing list