RFR: 8273139: C2: assert(f <= 1 && f >= 0) failed: Incorrect frequency [v2]

Tue Jan 18 10:58:28 UTC 2022

On Tue, 18 Jan 2022 08:35:59 GMT, Emanuel Peter <duke at openjdk.java.net> wrote:

>> Relaxed assert which can be triggered falsely upon an unfortunate sequence of float additions, such that the sum of all probabilities exceeds 1 by some very small amount in the order of float precision.
>> 
>> Before: assert(f <= 1 && f >= 0)
>> Now: assert(f >= 0), truncate f to be <=1
>> 
>> Frequency computation is done with floats, which are calculated through counting occurrences and dividing by totals.
>> This is done in multiple locations in the code.
>> 
>> We considered three options (in conversation with @rwestrel @chhagedorn @tobiasholenstein ):
>> 1. Consistently use `fesetround` to correctly round up/down (depends on context) if the results are ever used for frequency/probability calculations (many locations, multiple files). This option requires more code and maintainance. Implementing and testing is difficult. This is fragile if new code is added that impacts frequency computation - would have to remember to round correctly.
>> 2. Modify assert to `f <= 1+eps && 1 >= 0`, where `eps` is related to float precision, and is as close to zero as possible. However, since there can be an arbitrary number of additions, this error could grow arbitrarily. It is thus not evident how to determine `eps`.
>> 3. Drop the `f <= 1` condition (our choice). The exact comparison is inadequate for floats, and there is no evident replacement. This requires less code, is easier to maintain. Disadvantage: a developer may break something and not realize because the assert is no longer present. The impact would most likely be limited to performance, and not crash the VM or cause incorrect execution.
>> 
>> Checked that tests are not affected.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   collapse two lines of code

> 

Thanks for your comment, @dean-long.

I do not think that changing from `float` to `double` makes the problem occur less often. The errors will be smaller, yes, but there are still going to be rounding issues, and `f <= 1` could still occasionally be violated because of rouding.
We could log all cases where `f > 1`. But most of these cases are rounding errors and not bugs.

The only use of `double` I can see is if we use the option 2, with `f <= 1+eps`. The double mantissa has 52bit instead of 23bit for float. Maybe we can find an `eps` large enough so that it can never be exceeded by rounding errors, over all operations (add, mul, etc.), while still keeping it small enough that the condition is useful to detect actual bugs. If we assume we only have adds, with a maximal rounding error of 2^-52, and we expect at most 2^32 additions, then we could have `eps=2^-20`. However, we would have to be sure that these assumptions hold, and also hold in the future.

Changing from `float` to `double` is not trivial. We seem to use `float` to do counts in multiple files, and other code depends on it as well. A quick `grep "float.*cnt" . -rE` gets me a good number of hits, and not just in the file I edited (`loopPredicate.cpp`).

@dean-long : what would you recommend?

@rwestrel : what do you think?

-------------

PR: https://git.openjdk.java.net/jdk/pull/7113