<div dir="ltr"><br><div>Thanks, John, for the correction. Much different from what I assumed! Very useful to know.</div><div><br></div><div>John</div><div><br></div><div>PS:  Thanks for the JDL-8255024 reference.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 28, 2024 at 1:28 AM John Rose <<a href="mailto:john.r.rose@oracle.com">john.r.rose@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">This is an interesting discussion.  I’m not surprised<br>

that people are surprised by ==/acmp semantics.<br>

We surprised ourselves, at first.<br>

<br>

On 27 Feb 2024, at 17:04, John Bossons wrote:<br>

<br>

> Hi Remy,<br>

><br>

> Even for a value object field in which the stored VO value is a reference?<br>

<br>

Yes, that is Remy’s point.  It doesn’t matter how the virtual<br>

machine concretely stores the value.  The semantics are derived<br>

from virtual data, not physical data.<br>

<br>

The VM makes optimization decisions below the “virtual metal”.<br>

Users should expect that these decisions have no effect on program<br>

semantics (and so program outputs), other than perhaps speed<br>

or footprint.  How would users like it if their methods with<br>

floating point computations got slightly different results,<br>

after the JIT kicked in?  Answer: It would be a bug, and they<br>

wouldn’t.<br>

<br>

It’s the same with the VM’s choice of how to implement the<br>

virtual semantics of values, using a variety of techniques,<br>

including “buffering” as a separate node in the heap.<br>

“Buffering” is like what we have called “boxing” in the past.<br>

But it’s different: The identity of a “box” is sadly exposed<br>

in VM semantics, while a “buffer”’s identity is suppressed.<br>

<br>

If a value is not flattened in its container, it may instead<br>

be buffered on the heap, with a pointer left behind in the<br>

container.  (It might also be lifted to registers after<br>

inlining and escape analysis; this is a different kind of<br>

flattening we call “scalarization”.)  But the meaning of<br>

==/acmp is the same regardless of details of physical<br>

representation of its operands.<br>

<br>

> As I understand the spec, it's up to the JVM to choose how to handle a<br>

> value object (even a null-restricted VO), depending on its size. If a VO<br>

> field value is a VO, it's optional as to whether the JVM stores that field<br>

> value as a reference or as a value. What the JVM does will depend on the VO<br>

> size, right?<br>

<br>

Sort of right, but also an oversimplification.  There are many<br>

factors besides size.  Another factor can be whether the container<br>

is final or not, and yet another is whether it is volatile or not.<br>

As a rule of thumb, users should always expect the VM to surprise them,<br>

if they decide to open the hood and peek below the virtual metal.<br>

<br>

> If a VO field value is stored as a reference, calling == on<br>

> that reference will behave the same way as calling == on an identity object<br>

> reference.<br>

<br>

I am at a loss to understand the grounds of your confident assertion<br>

here.  In any case, it is false.  Consider the consequences of such<br>

a design:<br>

<br>

A. The VM will unpredictably choose to buffer, flatten, or scalarize<br>

values.<br>

<br>

B. These decisions are partially dependent on heap layout policy, which<br>

is stable over time (at this current date) but also dependent on JIT<br>

activity, which varies over time.<br>

<br>

C. Therefore, from run to run, or even within one run, a value which<br>

compares equal to itself at one point will compare unequal to<br>

itself later on, or vice versa.  We think users would not welcome<br>

this kind of uncertainty in Java code.<br>

<br>

<history><br>

<br>

Full disclosure:  About 10 years ago, I thought we could get away<br>

with it.  We’d have to tell users not to fully trust the results<br>

of ==/acmp.  Instead they must learn to always follow up == with<br>

a call to Object::equals.  I thought this might be tolerable<br>

because users already do this as a matter of habit.<br>

<br>

The basic rule would have been, if two values compare ==,<br>

you know they must be the same value, but if they don’t,<br>

they still might be the same value.  You must call equals<br>

if you need the accurate result.  So doing the extra call<br>

restores predictability, of the expression as a whole.<br>

<br>

I called this the Heisenbox Model, because you would always<br>

have a kind of “uncertainty principle” about whether a given<br>

value would be equal to itself, or whether it would suddenly<br>

turn into two separately buffered copies, whose pointers would<br>

be suddenly different.<br>

<br>

This can happen, for example, if JIT code deoptimizes to the<br>

interpreter, and the two copies of the same value are buffered<br>

separately, perhaps because they are in two stack frames<br>

related by inlining.  Reminder: You will be surprised if<br>

you peek inside the VM.  It is the VM’s responsibility to<br>

keep an evenhanded pretense of consistent behavior, whatever<br>

its internal gymnastics.<br>

<br>

So, changing execution paths to looking at values from<br>

a new angle might cause a deopt, and change the result<br>

of your comparison.  Observation collapses the<br>

configuration to a random outcome… voilà Heisenboxes.<br>

(Apologies to the real physicists.)<br>

<br>

FTR, my first public writing on value classes in 2012<br>

despaired of assigning any kind of predictable value<br>

to ==/acmp, which anticipated Heisenboxes as well:<br>

<a href="https://cr.openjdk.org/~jrose/oblog/value-types-in-the-vm.html" rel="noreferrer" target="_blank">https://cr.openjdk.org/~jrose/oblog/value-types-in-the-vm.html</a><br>

I am glad most of it has proven wrong, including<br>

the bit about ==.  My colleagues finally beat it<br>

out of me. The last gasp was in 2016:<br>

<a href="https://bugs.openjdk.org/browse/JDK-8163133" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8163133</a><br>

<br>

One reason I thought we could get away with Heisenboxes is<br>

there is a similar behavior, historically, in Common Lisp.<br>

The EQ predicate distinguishes certain numeric values that<br>

are numerically indistinguishable, if they happen to be<br>

buffered separately.  The EQV predicate fixes that, at a<br>

higher cost.  (And EQ and EQV are the same on Lisp’s version<br>

of identity objects, which is basically all objects except<br>

numbers.)  So, relative to Lisp, our decision is that values<br>

are always compared by value, like EQV, and never by buffer<br>

identity like EQ.  Dropping EQ from the model takes out a<br>

bunch of shifty uncertain behavior; now you can trust ==/acmp<br>

to do the same thing all the time for the same inputs.<br>

<br>

</history><br>

<br>

> If the VO is stored as a value (a set of values which could<br>

> include another VO. field), then the == will be recursive. Am I incorrect?<br>

<br>

Regardless of how the VM stores the VO, the == will be recursive.<br>

<br>

Some find the following corollary to this principle surprising:<br>

<br>

Neither the type nor the implementation of the variable containing<br>

the value affects the equality comparison.  Therefore, if you have<br>

two Object pointers compared with ==/acmp, and they are both values<br>

(of the same class), there will be a recursive descent.  It doesn’t<br>

matter that you thought you have mere Object pointers.  The VM<br>

does the same work for value comparison regardless of context.<br>

<br>

This also means that if a value contains an Object field, then<br>

the recursive ==/acmp on a pair of such values will, in fact,<br>

test the corresponding Object references (as if by ==/acmp)<br>

and recurse if and only if the two objects are both values<br>

of the same class.<br>

<br>

Finally, this also means that if one has a value with two<br>

Object fields, an ==/acmp operation might possibly perform<br>

an unbounded recursive descent of a tree.  At this point,<br>

a class designer might have to think hard about what is<br>

the desired behavior.  One way to break the recursive<br>

descent is to use an intermediate heap node that is an<br>

identity object, such as an array.  That is in fact how<br>

recursive structures are often written.  But for a pure<br>

immutable fixed-arity tree class, refactoring to a value<br>

will involve consideration of the cost of ==/acmp.  The<br>

design of such classes (such as HAMTs) is rarely attempted,<br>

and by experts when done at all.  I am not worried about<br>

such experts being overwhelmed by the odd scaling of<br>

recursive descent in their exquisitely tuned classes.<br>

<br>

So we have been thinking about all this diligently and<br>

carefully for a decade, and JEP 401 is the result.<br>

<br>

Some might think it mandates too much recursion for ==/acmp.<br>

Users should prepare, however, to be surprised by what the VM<br>

will do to make such things fast.  Remember the old<br>

conventional wisdom about bytecode-based VMs and pervasive<br>

virtual calls:  Java was supposed to be slow because of<br>

such things… until it wasn’t slow anymore.  One promising<br>

idea (IMO) for reducing ==/acmp costs is folding them with<br>

equivalent tests in Object::equals:<br>

<a href="https://bugs.openjdk.org/browse/JDK-8255024" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8255024</a><br>

<br>

So VM internals are surprising.  That’s why (a) we are<br>

required, and (b) we are able, to design high-level VM<br>

semantics, with no uncertainties. Even if they might<br>

seem “too expensive” at first blush.<br>

<br>

I hope these perspectives will be useful.  They have<br>

been hard-won.<br>

<br>

— John<br>

</blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr">Phone:  (416) 450-3584 (cell)</div></div>