visualizing objects on two heaps

Wed Aug 10 19:30:06 UTC 2022

Are values like 42 objects like some `new Object()` (call it X) is an 
object?  We are (mostly) saying “yes” because that lets us make new 
kinds of primitives which code like objects (classes) but act like 
values (primitives).  Also, unifying concepts (where they *can* be 
unified) will probably produce a user model that is easier to use.

But this viewpoint (everything is an object) has a downside, because 
objects like X seem “thick and chunky” while 42 seems to be 
different, maybe “light and airy”.  The following is a rambling 
exploration of why those seemings might differ, and how perhaps to 
realign them.

I tend to think of Valhalla value objects and primitives in traditional 
mathematical terms:  The entity 42 exists independently of any context, 
and can be summoned as needed, many times if needed, in a given Java 
application.  (As Brian points out, that’s what `CONSTANT_Integer` is 
for, up to 32 bits.)  And the same goes for the 2D point value `new 
Point(42,42)`.  There are surely integers (bigger than 32 bits) which 
nobody is currently thinking of and no application is currently 
processing, but any one of them can be summoned at a moment’s notice 
when an application needs it.  The same reasoning applies to structures 
built up from integers like 2D points.

(Summoning an integer or a 2D point in software requires a mechanical 
procedure to derive its bits.  You can’t say in a real program, “the 
least crypto-key not yet used anywhere on this planet”, even though in 
some sense that bit pattern is well-defined.  For fun discussions of 
numbers which are at the far edge of the thinkable, see sites like 
http://jdh.hamkins.org/largest-number-contest/ and 
https://medium.com/@joshkerr/who-can-name-the-biggest-number-contest-a2211d21be09 
.  Today I was reminded that any given volume of physical space can only 
represent a limited range of integers, in any scheme.  As an amusing 
corollary, if I were able briefly to wrap my brain around the bits of 
something really big like Graham’s number, the region of space 
containing that brain would require cosmic inflation, and/or would 
collapse into a black hole.)

In some sense, the entity X which my program is about to call into 
existence by executing `new Object()` could be said to exist 
independently of any context as well.  This entity certainly possesses a 
hidden identity property that is (presumably) different in all space and 
time from any other similarly created object.  Both the object X and its 
identity can be viewed as outside of time, eternally pre-existent.  And 
yet, it is hardly ever useful to think of X in these terms.  X is 
clearly something inside a physical box somewhere, and pretending it was 
eternally pre-existent feels like a mind game.  I will say, however, 
that when writing formal semantics in the language of math, you *do* 
play that mind game.  And when reasoning about compiler optimizations, 
is sometimes useful to play such games.  The C2 JIT models immutable 
fields (like `oop._klass` and `arrayOop._length`) as indefinitely 
pre-existent in memory, whether the containing object’s allocation was 
recent or not.  This model is chosen not because the authors of C2 are 
platonists, but because it is the simplest model to use inside the 
limited horizon of a compilation task.

(If X had mutable fields, then a timeless mathematical model of its 
would require some representation of the varying memory states that X 
might have.  You need to say things like “X has these field values in 
this overall memory state.”  It can be done, and in fact optimizers 
*must* do this.  But today I’m not dealing with mutable fields, nor 
with synchronization state, which is also mutable.)

So, it’s possible to think of both 42 and X as existing outside of 
time in some platonic mathematical universe.  But most people will find 
it tolerable to think of a platonic 42 but not X.

After all, the way most of us learn about X is by being shown a diagram 
of X as a data structure, probably a box with a header field.  There 
might be an arrow from header to a type metadata entity (maybe labeled 
`Object.class`).  There is certainly an arrow from any other place that 
is referring to X.  I call this the “boxes and arrows” presentation 
of data structures.  Clearly if something is a box, it’s sitting on a 
whiteboard or page, or in a warehouse where such boxes live.  We are 
taught (early on) that such boxes sit in “memory” or in “the 
heap”.

In search of consistency between 42 and X, we can go to the other 
extreme, and require that all entities (in software) are confined to 
real, existing, physical boxes of computer stuff.  This is easy for X, 
and not hard for 42 either.  You simply say that 42 exists wherever its 
bits have been computed and stored (in a variable or a 
`CONSTANT_Integer` structure).  Then, you agree that there is a way to 
detect the equality of any two summonings of 42 are in fact both 42 (or 
to tell that they differ).  And you should also agree to talk of “some 
42 somewhere”, not “the value 42”.  At most you can say “some 42 
somewhere which will be detectably equal to the 42 I am working with 
right now”.  I think most people fill find this to be a mind game as 
well; they are platonists for 42 but not for our friend X.

As a historical note, there is are schools of mathematical thought 
called “constructivist”, predating the era of computing, that 
bravely reject the taint of platonism.  Such viewpoints take that view 
that every mathematical (and hence computational) object and reasoning 
is a matter of construction, with a finite sequence of steps.  Without 
being an expert, I would guess that constructivists might prefer the 
extreme account of 42:  It’s a formal pattern which doesn’t exist 
until someone constructs it; it is constructed many times; constructed 
versions are distinct but can be proven equal; such equality proofs are 
again constructive in nature.  How many versions of 42 can dance on the 
head of a constructivist?  Maybe many, but most people would say no more 
than one.

If we decide that 42, like X, is a merely bit pattern in the computer 
(perhaps replicated many times), then we get a nice, concrete model of 
boxes and arrows everywhere.  We will need to sidestep an embarrassing 
infinite regress when try to draw an arrow to 42 coming from the field 
`Integer.value`; it’s doable without arrows thankfully, when you just 
write the label “42” in the box.  So in a graph of boxes and arrows, 
such labels are usually necessary also.  This is why Java has 
primitives, and a distinction between `Integer` (which is a box) and 
`int` (which is the label in one of the fields of that box).

Given such a distinction between labels and arrows, one might think that 
the Valhalla goal unifying `int` and `Integer` is impossible.  But this 
leads me to a new thought, which is (a) put everything on heap, and (b) 
distinguish sharply between the value heap and the identity heap.

So, although I prefer a platonic viewpoint in most cases, here’s a 
non-platonic, constructive, boxes-and-arrows viewpoint that one might 
prefer for teaching and visualization.

There is a value heap and an identity heap in the Valhalla VM.  Every 
identity object lives as a box on the identity heap, and conversely 
every value object lives as a box on the value heap.  All non-reference 
values are visualized as “value arrows” into the value heap.  All 
non-null reference values are visualized as “identity arrows” into 
the identity heap.

For every non-abstract class, instances are allocated on one heap or the 
other; no class is allocated on both heaps.  Instances of `String` and 
`Integer` are allocated on the identity and value heaps, respectively.

Our friend X is visualized as an identity arrow into the identity heap.  
He has a header and no other fields.  (We might give him a secret field 
to hold his identity, but this model does not require it, unlike the 
platonic model.)

The value 42 is visualized as a value arrow into the value heap, to a 
box labeled 42.  The box also has a header, which says 
“Integer/int”.  Its backward-compatible `value` field points to 
itself.  The label is not the same as the `value` field; it is part of 
the header I guess.

As a clever optimization, compiled code might dispense with the arrow 
and just hoist the bit pattern of the label into a register.  This is 
what we mean when we say that value types are monomorphic:  They can be 
manipulated in terms of the characteristic labels, instead of their 
arrows.  Nevertheless, the concept of value arrow comes first, and only 
the performance model or JIT-writer’s manual mentions the possibility 
of unboxed labels.

A variable of type `Object` is visualized as the root of an arrow (into 
either heap), or else the special label or pseudo-arrow `null` which is 
not an arrow into either heap.

Other than `null`, primitive value labels (like 42) can exist only 
inside the value heap.  In fact, they properly exist only inside the 
primitive objects themselves.  Everything else is arrows (or `null`).

Normally identity and value arrows look and feel the same.  Field access 
works the same for both.  (That is why `Integer.value` loops back to 
`this`.)  But they differ when the `==` (`acmp`) operator looks at them. 
  Value arrows are proven equal or unequal using a field-wise recursive 
descent.  This descent bottoms out at primitives (consulting their 
labels) or at `null` or an identity arrow.  Identity arrows appeal 
solely to the identity of the object in the identity heap.  Of course 
`null` is equal only to itself.

As noted above, new identity objects are created only by the `new` 
bytecode.  New value objects are created by `aconst_init` or `withfield` 
or by any bytecode which produces a primitive value!

For example, incrementing 42 (maybe with `iinc`) produces a new value 
arrow to the value heap, which just so happens to contain an object 
labeled 43.  How did we get so lucky?  Perhaps Plato is smiling on us, 
and it has always been there.  (This is basically what Java mandates 
today for auto-boxing, for small-enough values!)  More constructively, 
the VM ensures that, if 42 must be incremented, it either finds a 
previously created copy of 43, or makes a new copy on the fly.  The VM 
can flip a coin in real-time and do either, because it is allowed to 
make many copies of 43 in the value heap.  This is OK because there is 
nothing the user can do to distinguish such multiple copies, just as 
there is nothing the user can do to tell if the GC has moved an object 
in the identity heap.

What goes for `iinc` goes for all the other value-producing bytecodes, 
whether primitive or not.  The VM is always free to recycle a previously 
existing object in the value heap, if it can find one, or to make a new 
box.

All of this is a visualization exercise.  One might prefer, after all, 
to visualize a single heap with two kinds of objects in it.  In any 
case, a less platonic, more constructive visualization can be obtained 
by insisting that all objects, including all values even primitives, are 
uniformly accessed via arrows.

I guess my point is that, if we are willing to pretend that arrows are 
everywhere, we need not worry whether something is “really” an 
`Integer` or “really” an `int`.

I haven’t said anything yet about flattening.  That requires 
additional work to visualize the container property of the arrows in the 
heap.  There are at least two ways to do it:  Color the value arrows 
that are to be flattened, and just nest one box inside the other.  I 
think I would start with colored arrows, explaining that the VM is being 
invited to flatten, and then show nested boxes.  But the nested boxes 
violate the “everything is an arrow” symmetry of the visualization, 
so they are a sort of commentary.

Again, as commentary on the VM’s likely optimization, you might 
“hoist” 42 onto the stack or into a field by erasing the arrow (to a 
copy of 42 in the value heap) and writing the label “42” in its 
place.  And you might do this in heap fields as well.  Turning back to 
2D points, you might “hoist” `Point(42,42)` into a stack variable or 
heap field by erasing the arrow and replacing it with a nested box (for 
the point).  In either case (unboxed 42 or `Point(42,42)`), there is a 
caveat that when you use that value, you are “really” using an arrow 
into the value heap.

So, FWIW, and in the spirit of brainstorming, that is one way to (a) 
make values and objects more like each other, while (b) staying 
relentlessly constructive, and (c) avoiding the question of whether an 
object is an `Integer` or an `int`.

The distinction of `Integer` vs. `int`, and also `Point` vs. 
`Point.val`, is therefore a matter of viewpoint and not essence.  
Everything is an object, such as `Integer` or `Point`.  (Except `null`.) 
  The `.val`/`.ref` distinction, like other non-value-set distinctions 
(`final` vs. non-`final`, or `Object` vs. a narrower type), is a way for 
the programmer to annotate the program to express a richer view of the 
programmer’s reasonings about the program, and to unlock 
optimizations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220810/40565541/attachment-0001.htm>