RFR: 8346880: [aix] java/lang/ProcessHandle/InfoTest.java still fails: "reported cputime less than expected" [v2]
Joachim Kern
jkern at openjdk.org
Thu Jan 9 11:37:17 UTC 2025
> The test java/lang/ProcessHandle/InfoTest.java still fails sporadically on AIX. The test exclusion was removed through [JDK-8211847](https://bugs.openjdk.org/browse/JDK-8211847) under the assumption the problem was gone. But it turned out that it was wrong.
>
> We can see an exception like:
>
> java.lang.AssertionError: reported cputime less than expected: PT0.2S, actual: Optional[PT0.021179882S]
> at org.testng.Assert.fail(Assert.java:99)
> at InfoTest.test1(InfoTest.java:110)
>
> After a discussion with Roger Riggs and the team, we came to the following conclusion.
> The problem is based on 2 independent causes; one fundamental and one AIX-specific.
>
> The fundamental cause is as follows:
> Modern hardware provides many hardware threads (up to several hundred) that enable the worker threads of the processes to be processed in real parallel. To ensure that such a worker thread does not take up a hardware thread resource for itself, it is rolled out by the OS after a few ms at the latest to make room for another worker thread, possibly from another process.
> The OS continuously adds up all the times that each worker thread of a process is active as process cpu time.
>
> It is easy to see that there is no correlation between the CPU time of a process and the real time(wall time).
>
> If you have a system with many hardware threads and few worker threads, these are active almost all the time. If they are rolled out, they are immediately rolled back in due to a lack of competition. If a process has several worker threads, the CPU time will increase faster than the real time. In this case, cpu time > real time is to be expected, which is what the test wants.
>
> However, if the same system is heavily loaded, i.e. there are a lot of worker threads competing on one hardware thread, each individual worker thread can only become active relatively rarely. Even if a process has several worker threads, the total CPU time will be less than the past real time. This is even more pronounced if the individual worker threads have to wait for each other via synchronization objects. Since this is the normal case, cpu time < real time usually applies.
>
> Therefore, such a test makes little sense in principle.
>
> The AIX-specific cause of the problem lies in the API used to get the cpu time. The `/proc/<pid>/psinfo` file is evaluated to obtain the cpu time. The /proc directory is only present on AIX for portability reasons. The data in it is only updated at long intervals. For example, the cpu time is only up...
Joachim Kern has updated the pull request incrementally with two additional commits since the last revision:
- remove extra white space
- omit unused variable
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/22966/files
- new: https://git.openjdk.org/jdk/pull/22966/files/1a5a1951..29007e66
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=22966&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=22966&range=00-01
Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod
Patch: https://git.openjdk.org/jdk/pull/22966.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/22966/head:pull/22966
PR: https://git.openjdk.org/jdk/pull/22966
More information about the core-libs-dev
mailing list