Logically extend strictfp to intrinsics (was RFR(S): 8063086: Math.pow yields different results upon repeated calls)

Sat Jan 10 15:44:35 UTC 2015

>> Why is this considered a bug? Doing so seems to be opening a can of
worms. 
>> As an example any expression which lowers to contain FMA like
instructions will
>> yield different results once compiled.  It seems more reasonable to
declare it
>> proper and expected behavior. 

> That comment:

>
https://bugs.openjdk.java.net/browse/JDK-8063086?focusedCommentId=13594090&p
age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment
-13594090

> should answer the question.

Mr. Darcy's comment doesn't address my attempted point which is the problem
is too complex if you include fused operations as well as transforms on
intrinsics.  I'll use these comments as a jumping board to attempt to
clarify.

comment 1: "Whenever a numerical intrinsification is added, it should be
added to the interpreter, C1, C2, (C3, C4, etc.) so that the numerical
behavior is humane."

Going this route implies the following:
1) The interpreter and all compilers within a given VM version must be kept
lock-stepped.
2) The modified interpreter for each transform must detect that the
compiler(s) 'must' perform the one in question and only then compute the
alternate form.  If there are any errors in this then the 'bug' still exists
and it simply occurs in different situations.
3) Likewise the compiler can only perform the transforms in the cases that
the interpreter successfully detected that it must.  Which in turn will only
be the most trivial of cases, thereby marginalizing the transform to near
uselessness.

As an example of the original thread, transforming pow(a,b) -> a*a if b==2.
The interpreter cannot simply test if 'b' is 2 at each invoke since the 'b'
may have been computed...doing so would make them not agree in that case.
Simply checking if a constant of '2' has been loaded makes the transform not
worth implementing.  At additional complexity (increased engineering cost
and reduced performance of the interpreter) this can be somewhat improved
upon by adding additional cases, but it cannot reach the point of what the
compiler might be able to deduce since that requires analysis some of which
requires that the interpreting phase has already occurred. 

The Front end AOT is in a much better position to perform these types of
limited transforms than the runtime.

comment 2: "However, it is clearly ugly misbehavior of the system when this
kind of inconsistency occurs external from any sort of reasonable user
control."

I claim that in the vast majority of cases that this is desirable behavior.
Lower latency and decreased error bounds are of much greater interest than
bit-exact computations in floating-point.  In the case of fused operations,
the user does have control over the situation:  specify strictfp. It seems
like a simple logical extension to extend this notion to include non-bit
exact transforms related to intrinsic functions.  (Obviously requiring the
use of StrictMath wouldn't be a reasonable solution).  This would mean that
no intrinsic related code is need in the interpretor and all compilers are
independent of one another.  The compiler only needs to know to not perform
any non-bit exact transforms if the invoke is inside strictfp.