RFR: 8352251: Implement Cooperative JFR Sampling [v15]

Markus Grönlund mgronlun at openjdk.org
Wed Apr 30 17:05:49 UTC 2025


On Tue, 29 Apr 2025 16:47:17 GMT, Markus Grönlund <mgronlun at openjdk.org> wrote:

>> Greetings,
>> 
>> This is the implementation of JEP [JDK-8350338 Cooperative JFR Sampling](https://bugs.openjdk.org/browse/JDK-8350338).
>> 
>> Implementations in this change set are provided and have been tested on the following platforms:
>> 
>> - windows-x64
>> - windows-x64-debug
>> - linux-x64
>> - linux-x64-debug
>> - macosx-x64
>> - macosx-x64-debug
>> - linux-aarch64
>> - linux-aarch64-debug
>> - macosx-aarch64
>> - macosx-aarch64-debug
>> 
>> Testing: tier1-6, jdk_jfr, stress testing.
>> 
>> Platform porters note:
>> Some platform-specific code needs to be provided, mainly in the interpreter. Take a look at the following files for changes:
>> 
>> - src/hotspot/cpu/x86/frame_x86.cpp
>> - src/hotspot/cpu/x86/interp_masm_x86.cpp
>> - src/hotspot/cpu/x86/interp_masm_x86.hpp
>> - src/hotspot/cpu/x86/javaFrameAnchor_x86.hpp
>> - src/hotspot/cpu/x86/macroAssembler_x86.cpp
>> - src/hotspot/cpu/x86/macroAssembler_x86.hpp
>> - src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>> - src/hotspot/cpu/x86/templateTable_x86.cpp
>> - src/hotspot/os_cpu/linux_x86/javaThread_linux_x86.hpp
>> 
>> Thanks
>> Markus
>
> Markus Grönlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Configuration and test for jdk.SafepointLatency event

The issue is that the CPU context can be retrieved here after the safepoint poll has been tested. That is causing a race, because a sample would be taken for an fp that is about to pop, breaking the invariant of the sampling mechanism.

It is only for some sensitive interpreter positions that we need to inspect the correct fp (the sender's fp), to avoid this race.

On x64, we signal that by preemptively moving rbp, first to update the CPU context and then by explicitly setting the sender_java_fp field in the LJF.

With your suggestion, we would always prioritize the sender fp (because it is always available), which is unnecessary and incorrect (biased), except for where we are about to pop an interpreter frame (but we can't decide when that is the case).

For testing, you will need to run some longer stress tests to see the effect of a racy sampling attempt.

To provoke taking more samples, you can decrease the sampling interval of JFR by setting the following in default.jfc and / or profile.jfc:

`diff --git a/src/jdk.jfr/share/conf/jfr/profile.jfc b/src/jdk.jfr/share/conf/jfr/profile.jfc
index 4c9f4b4f8ec..75f8d75c580 100644
--- a/src/jdk.jfr/share/conf/jfr/profile.jfc
+++ b/src/jdk.jfr/share/conf/jfr/profile.jfc
@@ -198,12 +198,12 @@

     <event name="jdk.ExecutionSample">
       <setting name="enabled" control="method-sampling-enabled">true</setting>
-      <setting name="period" control="method-sampling-java-interval">10 ms</setting>
+      <setting name="period" control="method-sampling-java-interval">1 ms</setting>
     </event>

     <event name="jdk.NativeMethodSample">
       <setting name="enabled" control="method-sampling-enabled">true</setting>
-      <setting name="period" control="method-sampling-native-interval">20 ms</setting>
+      <setting name="period" control="method-sampling-native-interval">1 ms</setting>
     </event>

     <event name="jdk.SafepointLatency">`
     
 Try running some longer stress test or benchmark, passing:
 
 `-XX:StartFlightRecording:settings=profile.jfc`

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24296#issuecomment-2842656922


More information about the hotspot-dev mailing list