[jdk16] RFR: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation

Thomas Schatzl tschatzl at openjdk.java.net
Thu Jan 21 10:05:07 UTC 2021


Hi all,

  can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable.

We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes 

  pss->trim_ticks().seconds() == 0.0```

fails on random tests. The `seconds()` methods is

  return (double)value * ((double)unit / (double)TimeSource::frequency());```

where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e.

`(double) 0 * ((double) 1 / (double) 1000...000)`

does not equal `0.0`.

Code like this:

double tt = pss->trim_ticks().seconds();
assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt));
gives something like:

`assert(tt == 0.0," .... 0.0 0x00000....0000"`

so somehow the bit pattern 0x00...000 does not compare to FP 0.0.

I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time.
Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero).

There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI.

The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0".

Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5

-------------

Commit messages:
 - Initial commit

Changes: https://git.openjdk.java.net/jdk16/pull/128/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=128&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8227695
  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk16/pull/128.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/128/head:pull/128

PR: https://git.openjdk.java.net/jdk16/pull/128



More information about the hotspot-gc-dev mailing list