RFR: 8361716 : GCTraceCPUTime may report incorrect times during high load from JNI code

Thomas Schatzl tschatzl at openjdk.org
Tue Dec 16 09:46:10 UTC 2025


On Mon, 15 Dec 2025 12:50:55 GMT, Jonas Norlinder <jnorlinder at openjdk.org> wrote:

> GCTraceCPUTime is used by Serial, Parallel and G1 to estimate GC CPU time spent during safepoints. It estimates CPU time by sampling the CPU time for entire process by using `os::getTimesSecs`.
> 
> This will not be correct if native code from e.g. JNI is used and is incurring a high load on the machine as native code is not paused during safepoints.
> 
> A minimal example:
> 
>     public static void main(String[] args) {
>         // Initial GC that clean up start-up garbage etc.
>         System.gc();
> 
>         for (int i = 1; i <= 128; i++) {
>             addLoad(); // Creates 128 threads, that spins in a loop
>             try {
>                 Thread.sleep(2000);
>             } catch (InterruptedException e) {}
>             System.gc();
>         }
>         System.exit(0);
>     }
> 
> where addLoad() is a native call to a C++ program that create 128 pthreads that busy-spin forever with the largest nice value allowed.
> 
> In such scenario we can observe using G1
> 
> 
> [0.026s][info][gc,cpu] GC(0) User=0.00s Sys=0.00s Real=0.01s
> [2.103s][info][gc,cpu] GC(1) User=0.89s Sys=0.00s Real=0.03s
> [4.129s][info][gc,cpu] GC(2) User=0.51s Sys=0.00s Real=0.01s
> [6.195s][info][gc,cpu] GC(3) User=1.72s Sys=0.00s Real=0.06s
> [8.244s][info][gc,cpu] GC(4) User=1.46s Sys=0.00s Real=0.05s
> ...
> [201.191s][info][gc,cpu] GC(76) User=37.91s Sys=0.00s Real=1.19s
> [205.064s][info][gc,cpu] GC(77) User=52.44s Sys=0.00s Real=1.64s
> [208.068s][info][gc,cpu] GC(78) User=23.49s Sys=0.00s Real=0.74s
> [211.795s][info][gc,cpu] GC(79) User=46.96s Sys=0.00s Real=1.47s
> 
> 
> Since the Java code does not do anything the time should be close to 0.
> 
> # Implementation Comment
> 
> Total CPU time for GC operations on the VM thread is obtained by cumulatively adding to a counter. Since system time is calculated (on Linux) with `total - user` is value contains a small error delta. If we continuously increment a value this error would grow to an unacceptable level. Therefore, the implementation will rely on the fact that no other VM operation may run during GC safepoint, making it safe to query the VM thread for it's detailed CPU time at the start and end of GC pause.

Changes requested by tschatzl (Reviewer).

src/hotspot/os/aix/os_aix.cpp line 2475:

> 2473:       -1,
> 2474:       -1
> 2475:     };

If there are no comments to the elements, I would prefer to just use a single line for the return statement.
Same for below.

src/hotspot/os/bsd/os_bsd.cpp line 2312:

> 2310:     ((jlong) tinfo.user_time.seconds * (jlong)1000000000) + ((jlong) tinfo.user_time.microseconds * (jlong)1000),
> 2311:     ((jlong) tinfo.system_time.seconds * (jlong)1000000000) + ((jlong)tinfo.system_time.microseconds * (jlong)1000)
> 2312:   };

There are several known constants for these large numbers (e.g. `NANOSECS_PER_SEC`) already defined. I would prefer to use them.

src/hotspot/os/linux/os_linux.hpp line 2:

> 1: /*
> 2:  * Copyright (c) 1999, 2025, Oracle and/or its affiliates. All rights reserved.

The copyright change seems to be the only change in this file, and should be reverted.

src/hotspot/os/windows/os_windows.cpp line 4909:

> 4907:   return {
> 4908:     FT2INT64(UserTime) * 100,
> 4909:     FT2INT64(KernelTime) * 100

maybe create a local constant for the `100` to indicate that this is a value derived from MS API.

src/hotspot/os/windows/os_windows.cpp line 4940:

> 4938:     return -1;
> 4939:   }
> 4940:   return user_sys_cpu_time ? cpu_time.user + cpu_time.system : cpu_time.user;

All the `os::thread_cpu_time` implementation look the same now, but the BSD one. Maybe that distinction could be moved into `os::detailed_cpu_thread_time` (i.e. the `Unimplemented` bailout for that and `os::thread_cpu_time()` extracted out to avoid code duplication there?

src/hotspot/share/gc/shared/gcTraceTime.hpp line 40:

> 38:   bool _active;                 // true if times will be measured and printed
> 39:   jlong _starting_user_time;   // user time at start of measurement
> 40:   jlong _starting_system_time; // system time at start of measurement

Suggestion:

  jlong _starting_user_time;    // User time at start of measurement
  jlong _starting_system_time;  // System time at start of measurement

src/hotspot/share/runtime/os.hpp line 173:

> 171:   jlong user;
> 172:   jlong system;
> 173: };

Fwiw, `jlong` is a "Java" long, so maybe some C type should be used here. I am aware that the OS interface traditionally uses it, but changes need to start somewhere. However the runtime team may have additional comments about this.

-------------

PR Review: https://git.openjdk.org/jdk/pull/28824#pullrequestreview-3582152280
PR Review Comment: https://git.openjdk.org/jdk/pull/28824#discussion_r2622493322
PR Review Comment: https://git.openjdk.org/jdk/pull/28824#discussion_r2622504990
PR Review Comment: https://git.openjdk.org/jdk/pull/28824#discussion_r2622507542
PR Review Comment: https://git.openjdk.org/jdk/pull/28824#discussion_r2622510993
PR Review Comment: https://git.openjdk.org/jdk/pull/28824#discussion_r2622533968
PR Review Comment: https://git.openjdk.org/jdk/pull/28824#discussion_r2622515849
PR Review Comment: https://git.openjdk.org/jdk/pull/28824#discussion_r2622519179


More information about the hotspot-gc-dev mailing list