JFR thread sampling mechanism

David Holmes david.holmes at oracle.com
Sun Jun 30 07:28:59 UTC 2019


Hi Gil,

Redirecting to the hotspot-jfr-dev alias.

Cheers,
David

On 30/06/2019 5:20 pm, Gil Tene wrote:
> I would like to discuss a potential improvement to the JFR thread
> sampling mechanism, and would like to see if the change we'd
> propose has already been considered in the past.
> 
> I believe that the current thread sampling mechanism (mostly via
> hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp) can be
> summarized as: A control thread wakes up periodically (e.g. 100
> times per second) and in each period chooses a number (e.g. 5)
> threads to sample (by rotating through the overall list of threads)
> only if they are "in java", and a number (e.g. 1) threads (by
> separately rotating through the overall list of threads) to sample
> "only if it is in native". For each thread targeted to sample, the
> control thread suspends the target thread (e.g. for linux this is
> done by preparing a suspend request a sending a SIGUSR2 to
> make the thread deal with it), takes a stacktrace of the suspended
> thread, adds the stacktrace to JfrStackTraceRepository, and
> resumes the thread (e.g. on linux resumption involves setting up
> a resume request and again sending a SIGUSR2 to the thread to
> get it to handle it and resume).
> 
> We've been contemplating a change to make thread sampling use
> Posix timers instead, such that each thread would use a separate
> timer, and threads would receive signals based on their CPU
> consumption (the timer, e.g. created with timer_create(2), would
> be clocked by the thread CPU time of their associated threads,
> and signal their threads when that CPU time reaches a level
> [of e.g. 10 msec]). The signal handler will then perform the
> stacktrace sampling locally on the  thread, and deliver the
> stacktrace to JfrStackTraceRepository (either directly or by
> enqueing through an intermediary).
> 
> There are multiple potential benefits that may arise from switching
> to such a scheme, including significant reduction of sampling cost,
> improvement of density and focus of samples (fewer lost samples,
> ensuring that enough activity in a given thread will end up leading
> to a sample for that thread, etc.), and, potentially, an ability to
> (with additional  changes) better account for time spent "outside
> of java" in e.g. native and runtime code.
> 
> Has this (using thread-cpu-time-based posix timer sampling) been
> considered before?
> 


More information about the serviceability-dev mailing list