[jdk16] RFR: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation
Thomas Schatzl
tschatzl at openjdk.java.net
Thu Jan 21 10:05:07 UTC 2021
Hi all,
can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable.
We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes
pss->trim_ticks().seconds() == 0.0```
fails on random tests. The `seconds()` methods is
return (double)value * ((double)unit / (double)TimeSource::frequency());```
where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e.
`(double) 0 * ((double) 1 / (double) 1000...000)`
does not equal `0.0`.
Code like this:
double tt = pss->trim_ticks().seconds();
assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt));
gives something like:
`assert(tt == 0.0," .... 0.0 0x00000....0000"`
so somehow the bit pattern 0x00...000 does not compare to FP 0.0.
I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time.
Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero).
There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI.
The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0".
Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5
-------------
Commit messages:
- Initial commit
Changes: https://git.openjdk.java.net/jdk16/pull/128/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=128&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8227695
Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
Patch: https://git.openjdk.java.net/jdk16/pull/128.diff
Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/128/head:pull/128
PR: https://git.openjdk.java.net/jdk16/pull/128
More information about the hotspot-gc-dev
mailing list