Recursively comparing

Wed Feb 28 18:35:43 UTC 2024

Thanks, John, for the correction. Much different from what I assumed! Very
useful to know.

John

PS:  Thanks for the JDL-8255024 reference.

On Wed, Feb 28, 2024 at 1:28 AM John Rose <john.r.rose at oracle.com> wrote:

> This is an interesting discussion.  I’m not surprised
> that people are surprised by ==/acmp semantics.
> We surprised ourselves, at first.
>
> On 27 Feb 2024, at 17:04, John Bossons wrote:
>
> > Hi Remy,
> >
> > Even for a value object field in which the stored VO value is a
> reference?
>
> Yes, that is Remy’s point.  It doesn’t matter how the virtual
> machine concretely stores the value.  The semantics are derived
> from virtual data, not physical data.
>
> The VM makes optimization decisions below the “virtual metal”.
> Users should expect that these decisions have no effect on program
> semantics (and so program outputs), other than perhaps speed
> or footprint.  How would users like it if their methods with
> floating point computations got slightly different results,
> after the JIT kicked in?  Answer: It would be a bug, and they
> wouldn’t.
>
> It’s the same with the VM’s choice of how to implement the
> virtual semantics of values, using a variety of techniques,
> including “buffering” as a separate node in the heap.
> “Buffering” is like what we have called “boxing” in the past.
> But it’s different: The identity of a “box” is sadly exposed
> in VM semantics, while a “buffer”’s identity is suppressed.
>
> If a value is not flattened in its container, it may instead
> be buffered on the heap, with a pointer left behind in the
> container.  (It might also be lifted to registers after
> inlining and escape analysis; this is a different kind of
> flattening we call “scalarization”.)  But the meaning of
> ==/acmp is the same regardless of details of physical
> representation of its operands.
>
> > As I understand the spec, it's up to the JVM to choose how to handle a
> > value object (even a null-restricted VO), depending on its size. If a VO
> > field value is a VO, it's optional as to whether the JVM stores that
> field
> > value as a reference or as a value. What the JVM does will depend on the
> VO
> > size, right?
>
> Sort of right, but also an oversimplification.  There are many
> factors besides size.  Another factor can be whether the container
> is final or not, and yet another is whether it is volatile or not.
> As a rule of thumb, users should always expect the VM to surprise them,
> if they decide to open the hood and peek below the virtual metal.
>
> > If a VO field value is stored as a reference, calling == on
> > that reference will behave the same way as calling == on an identity
> object
> > reference.
>
> I am at a loss to understand the grounds of your confident assertion
> here.  In any case, it is false.  Consider the consequences of such
> a design:
>
> A. The VM will unpredictably choose to buffer, flatten, or scalarize
> values.
>
> B. These decisions are partially dependent on heap layout policy, which
> is stable over time (at this current date) but also dependent on JIT
> activity, which varies over time.
>
> C. Therefore, from run to run, or even within one run, a value which
> compares equal to itself at one point will compare unequal to
> itself later on, or vice versa.  We think users would not welcome
> this kind of uncertainty in Java code.
>
> <history>
>
> Full disclosure:  About 10 years ago, I thought we could get away
> with it.  We’d have to tell users not to fully trust the results
> of ==/acmp.  Instead they must learn to always follow up == with
> a call to Object::equals.  I thought this might be tolerable
> because users already do this as a matter of habit.
>
> The basic rule would have been, if two values compare ==,
> you know they must be the same value, but if they don’t,
> they still might be the same value.  You must call equals
> if you need the accurate result.  So doing the extra call
> restores predictability, of the expression as a whole.
>
> I called this the Heisenbox Model, because you would always
> have a kind of “uncertainty principle” about whether a given
> value would be equal to itself, or whether it would suddenly
> turn into two separately buffered copies, whose pointers would
> be suddenly different.
>
> This can happen, for example, if JIT code deoptimizes to the
> interpreter, and the two copies of the same value are buffered
> separately, perhaps because they are in two stack frames
> related by inlining.  Reminder: You will be surprised if
> you peek inside the VM.  It is the VM’s responsibility to
> keep an evenhanded pretense of consistent behavior, whatever
> its internal gymnastics.
>
> So, changing execution paths to looking at values from
> a new angle might cause a deopt, and change the result
> of your comparison.  Observation collapses the
> configuration to a random outcome… voilà Heisenboxes.
> (Apologies to the real physicists.)
>
> FTR, my first public writing on value classes in 2012
> despaired of assigning any kind of predictable value
> to ==/acmp, which anticipated Heisenboxes as well:
> https://cr.openjdk.org/~jrose/oblog/value-types-in-the-vm.html
> I am glad most of it has proven wrong, including
> the bit about ==.  My colleagues finally beat it
> out of me. The last gasp was in 2016:
> https://bugs.openjdk.org/browse/JDK-8163133
>
> One reason I thought we could get away with Heisenboxes is
> there is a similar behavior, historically, in Common Lisp.
> The EQ predicate distinguishes certain numeric values that
> are numerically indistinguishable, if they happen to be
> buffered separately.  The EQV predicate fixes that, at a
> higher cost.  (And EQ and EQV are the same on Lisp’s version
> of identity objects, which is basically all objects except
> numbers.)  So, relative to Lisp, our decision is that values
> are always compared by value, like EQV, and never by buffer
> identity like EQ.  Dropping EQ from the model takes out a
> bunch of shifty uncertain behavior; now you can trust ==/acmp
> to do the same thing all the time for the same inputs.
>
> </history>
>
> > If the VO is stored as a value (a set of values which could
> > include another VO. field), then the == will be recursive. Am I
> incorrect?
>
> Regardless of how the VM stores the VO, the == will be recursive.
>
> Some find the following corollary to this principle surprising:
>
> Neither the type nor the implementation of the variable containing
> the value affects the equality comparison.  Therefore, if you have
> two Object pointers compared with ==/acmp, and they are both values
> (of the same class), there will be a recursive descent.  It doesn’t
> matter that you thought you have mere Object pointers.  The VM
> does the same work for value comparison regardless of context.
>
> This also means that if a value contains an Object field, then
> the recursive ==/acmp on a pair of such values will, in fact,
> test the corresponding Object references (as if by ==/acmp)
> and recurse if and only if the two objects are both values
> of the same class.
>
> Finally, this also means that if one has a value with two
> Object fields, an ==/acmp operation might possibly perform
> an unbounded recursive descent of a tree.  At this point,
> a class designer might have to think hard about what is
> the desired behavior.  One way to break the recursive
> descent is to use an intermediate heap node that is an
> identity object, such as an array.  That is in fact how
> recursive structures are often written.  But for a pure
> immutable fixed-arity tree class, refactoring to a value
> will involve consideration of the cost of ==/acmp.  The
> design of such classes (such as HAMTs) is rarely attempted,
> and by experts when done at all.  I am not worried about
> such experts being overwhelmed by the odd scaling of
> recursive descent in their exquisitely tuned classes.
>
> So we have been thinking about all this diligently and
> carefully for a decade, and JEP 401 is the result.
>
> Some might think it mandates too much recursion for ==/acmp.
> Users should prepare, however, to be surprised by what the VM
> will do to make such things fast.  Remember the old
> conventional wisdom about bytecode-based VMs and pervasive
> virtual calls:  Java was supposed to be slow because of
> such things… until it wasn’t slow anymore.  One promising
> idea (IMO) for reducing ==/acmp costs is folding them with
> equivalent tests in Object::equals:
> https://bugs.openjdk.org/browse/JDK-8255024
>
> So VM internals are surprising.  That’s why (a) we are
> required, and (b) we are able, to design high-level VM
> semantics, with no uncertainties. Even if they might
> seem “too expensive” at first blush.
>
> I hope these perspectives will be useful.  They have
> been hard-won.
>
> — John
>

-- 
Phone:  (416) 450-3584 (cell)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-dev/attachments/20240228/1d6433d6/attachment.htm>