Should JLS 5.1.7 require that boxing uses `valueOf`?

Sun Mar 10 03:11:50 UTC 2024

More about this “deliberate design choice” that Brian mentions…

The JLS is right to avoid over-specifying translation strategy, in general.  It’s also right to avoid over-specifying the identity characteristics of boxes that are built implicitly by the language.

The result may be less predictability for some users, when they are extremely observant of tiny details of classfile contents or the identity of Integer or Long instances.  (Since those are VBCs you are explicitly discouraged from caring about their identities.)

Another result can outweighs the disadvantage of unpredictability, and that is more freedom to optimize Java programs.  Better optimization tends to help everybody.

For example, maybe there is some invokedynamic-based mechanism that is more appropriate (than Long::valueOf), when the javac compiler concludes that a cached value is profitable.  This will happen if javac notices that the long expression is constant, but may happen in other cases too.  The indy expression in a classfile might simply yield a cached box that is computed once and reused every time the same constant appears, at that point in the code.  Mandating Long::valueOf would prevent such work.

More subtly, the JIT also observes constant expressions, and usually finds them more frequently than javac (or the JLS).  If the JIT finds that some long value is in fact constant X (or is the same value X 90% of the time), perhaps the JIT should be allowed to preallocate a box of that special value X and use it in preference to a call to Long::valueOf.

In any case, when Valhalla lands, these particular questions will be moot, since all Long boxes containing the same value will themselves be the same value-object (modulo acmp, which is the most visibility you can get).

But Valhalla or not, it’s still a good principle to avoid over-specifying the output of javac.

A final thought about Long::valueOf:  The javadoc API spec is admirably restrained.  (That is, from my present POV, which is to maintain breathing room for optimizations.)  Long::valueOf NEVER promises that two successive calls with the same argument will produce DIFFERENT inputs.  That, to my mind, allows all kinds of optimizations to occur, including the JIT-based one I mentioned.  Even if the bytecode of Long::valueOf has no visible ability to return the same box twice (outside some limited range), the JIT could, arguably, give Long::valueOf that ability, simply by treating two calls to that method to common subexpression elimination.  Such CSE doesn’t work for any old method call, but it would be (I think) within spec. for Long::valueOf.

On 9 Mar 2024, at 8:52, Brian Goetz wrote:

> In general, the specification does not prescribe most details of how a source program is translated into classifies.  (Some exceptions are made in the section on binary compatibility.). This is a deliberate design choice; the JLS defines the meaning of a Java program, but not a prescriptive recipe for compiling it into a classfile.
>
>> … So would it make sense that the JLS requires that `valueOf` is used?
>> (This would also have the nice side effect that this JLS section could be simplified …