RFR: 8342818: Implement CPU Time Profiling for JFR

Erik Gahlin egahlin at openjdk.org
Tue Oct 29 11:00:23 UTC 2024


On Wed, 28 Aug 2024 16:47:21 GMT, Johannes Bechberger <jbechberger at openjdk.org> wrote:

> This is the code for the [JEP draft: CPU Time based profiling for JFR].

How is throttling handled? We don't want buffers to be filled up with sample events?

Not sure what is the best approach is, but I like to get a better understanding of the trade-offs.

In alternative 2, only use a rate / throttle, would you include the CPU time as field in the event as a weight factor, similar to the Object Allocation event?

Johannes, Markus, and I discussed the configuration, and we decided to have only one option called "rate," which might become "throttle". The option can accept either a period (e.g., 20ms) or a maximum rate (e.g., 1000/s). If the latter, the period is calculated from the number of cores when the setting is applied. In the future, it might be recalculated if the number of cores changes. The CPUTimeSample event will have a field that indicates what the CPU time was, perhaps called "weight" or "cpuTime". If needed, we will discuss this further next week.

Markus and I discussed the naming, and we are leaning against using "throttle" as the setting name. It is easier for users to understand if allocation, exception, socket read, etc., can share the same name for a rate like 500/s. The timespan or period for the CPUTimeSample event will serve as an override for advanced users. Other throttle settings in the future could have similar overrides, such as 1 MB for a memory-based event.

When it comes to naming the field, we think "sampleTime" could work. We don't want to give the impression that the method actually executed during that time, which would be the case with "cpuTime." We discussed the name "weight," but we believe users will interpret it as CPU time and create GUI aggregations with total CPU time per thread or method anyway. If we use "sampleTime", we can place a proper unit on the field, which doesn't make sense for "weight".

Regarding the implementation, I think you can simplify things by just selecting a CPUThrottleSetting implementation in the EventControl class. There's no need to add native support, as it seems unlikely that CPUThrottle will be useful for other events in the future.


    private static Control defineThrottle(PlatformEventType type) {
        ...
        type.add(PrivateAccess.getInstance().newSettingDescriptor(...);
        if (type.getName().equals("jdk.CPUTimeSample") {
            return new Control(new CPUThrottleSetting(type), def);
        }
        return new Control(new ThrottleSetting(type), def);
    }


The Rate annotation class can be removed.

src/hotspot/share/jfr/metadata/metadata.xml line 939:

> 937:   </Event>
> 938: 
> 939:   <Event name="CPUTimeExecutionSample" category="Java Virtual Machine, Profiling" label="CPU Time Method Profiling Sample" description="Snapshot of a threads state from the CPU time sampler"

Maybe we should leave out "Execution" and only have CPUTimeSample?

src/hotspot/share/jfr/metadata/metadata.xml line 939:

> 937:   </Event>
> 938: 
> 939:   <Event name="CPUTimeSample" category="Java Virtual Machine, Profiling" label="CPU Time Method Profiling Sample" description="Snapshot of a threads state from the CPU time sampler"

I think we can drop "Profiling" from the label, "CPU Time Method Sample"

src/hotspot/share/jfr/periodic/jfrPeriodic.cpp line 148:

> 146: TRACE_REQUEST_FUNC(NativeMethodSample) {
> 147: }
> 148: TRACE_REQUEST_FUNC(CPUTimeSample) {

The event doesn't need to be periodic.

src/jdk.jfr/share/classes/jdk/jfr/internal/MetadataRepository.java line 86:

> 84:                 if (pEventType.hasPeriod()) {
> 85:                     pEventType.setEventHook(true);
> 86:                     if (!pEventType.isMethodSampling() || !pEventType.isCPUTimeMethodSampling()) {

The event doesn't have a period. You should be able to remove this.

src/jdk.jfr/share/classes/jdk/jfr/internal/query/FieldBuilder.java line 132:

> 130:             return true;
> 131:         }
> 132:         if (fieldName.equals("stackTrace.isNull")) {

With the failed field, this is no longer needed, I believe?

src/jdk.jfr/share/classes/jdk/jfr/internal/query/view.ini line 404:

> 402: 
> 403: [application.cpu-time-hot-methods]
> 404: label = "Java Methods that Executes the Most from CPU time sampler"

Use headline-style capitalization, capitalize the first and last words, and all nouns, pronouns, adjectives, verbs and adverbs.

src/jdk.jfr/share/classes/jdk/jfr/internal/util/ParsedCPUThrottle.java line 33:

> 31:  * A rate or fixed period, see {@link jdk.jfr.internal.Rate}
> 32:  */
> 33: public sealed interface ParsedCPUThrottle {

This seems complicated. Why isn't possible to keep the Rate class as is (no renaming) and then only create a TimespanRate class and do something like this:

    public record TimespanRate(double rate) {
    
        public static TimespanRate of(String text) {
           Rate r = Rate.of(text);
           if (r != null) {
              double rate = r.amount() * ...
              return new TimespanRate(rate)
           } else {
              long v = ValueParser.parseTimespanWithInfinity(text);
              double rate = v * ...
              return new TimespanRate(rate);
          }
      }
    
      public boolean isHigher(TimespanRate that) {
          return rate() > that.rate();
      }
    }

test/jdk/jdk/jfr/event/profiling/TestCPUTimeEventThrottling.java line 76:

> 74: 
> 75:     private static EventCount countEvents(int timeMs, String rate) throws Exception {
> 76:         Recording recording = new Recording();

Use try-with-resources.

test/jdk/jdk/jfr/event/profiling/TestCPUTimeFullStackTrace.java line 31:

> 29:  * @test
> 30:  * @key jfr
> 31:  * @requires vm.hasJFR

What happens if you run the test on a platform that doesn't support CPU Time sample?

test/jdk/jdk/jfr/event/profiling/TestFullStackTrace.java line 44:

> 42:  * @requires vm.hasJFR
> 43:  * @library /test/lib
> 44:  * @run main/othervm jdk.jfr.event.profiling.TestFullStackTrace wall-clock

Create separate tests for CPU time sampling events. It makes them easier to problem list and disable for specific platforms. If you want to share verification code, create  helper class.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20752#issuecomment-2340781084
PR Comment: https://git.openjdk.org/jdk/pull/20752#issuecomment-2380263077
PR Comment: https://git.openjdk.org/jdk/pull/20752#issuecomment-2407736399
PR Comment: https://git.openjdk.org/jdk/pull/20752#issuecomment-2416197137
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1751987754
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1769540838
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1806357857
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1806357106
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1806355696
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1751894865
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1806455856
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1810756770
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1769541205
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1751940865


More information about the hotspot-dev mailing list