visualizing objects on two heaps
John Rose
john.r.rose at oracle.com
Wed Aug 10 19:30:06 UTC 2022
Are values like 42 objects like some `new Object()` (call it X) is an
object? We are (mostly) saying “yes” because that lets us make new
kinds of primitives which code like objects (classes) but act like
values (primitives). Also, unifying concepts (where they *can* be
unified) will probably produce a user model that is easier to use.
But this viewpoint (everything is an object) has a downside, because
objects like X seem “thick and chunky” while 42 seems to be
different, maybe “light and airy”. The following is a rambling
exploration of why those seemings might differ, and how perhaps to
realign them.
I tend to think of Valhalla value objects and primitives in traditional
mathematical terms: The entity 42 exists independently of any context,
and can be summoned as needed, many times if needed, in a given Java
application. (As Brian points out, that’s what `CONSTANT_Integer` is
for, up to 32 bits.) And the same goes for the 2D point value `new
Point(42,42)`. There are surely integers (bigger than 32 bits) which
nobody is currently thinking of and no application is currently
processing, but any one of them can be summoned at a moment’s notice
when an application needs it. The same reasoning applies to structures
built up from integers like 2D points.
(Summoning an integer or a 2D point in software requires a mechanical
procedure to derive its bits. You can’t say in a real program, “the
least crypto-key not yet used anywhere on this planet”, even though in
some sense that bit pattern is well-defined. For fun discussions of
numbers which are at the far edge of the thinkable, see sites like
http://jdh.hamkins.org/largest-number-contest/ and
https://medium.com/@joshkerr/who-can-name-the-biggest-number-contest-a2211d21be09
. Today I was reminded that any given volume of physical space can only
represent a limited range of integers, in any scheme. As an amusing
corollary, if I were able briefly to wrap my brain around the bits of
something really big like Graham’s number, the region of space
containing that brain would require cosmic inflation, and/or would
collapse into a black hole.)
In some sense, the entity X which my program is about to call into
existence by executing `new Object()` could be said to exist
independently of any context as well. This entity certainly possesses a
hidden identity property that is (presumably) different in all space and
time from any other similarly created object. Both the object X and its
identity can be viewed as outside of time, eternally pre-existent. And
yet, it is hardly ever useful to think of X in these terms. X is
clearly something inside a physical box somewhere, and pretending it was
eternally pre-existent feels like a mind game. I will say, however,
that when writing formal semantics in the language of math, you *do*
play that mind game. And when reasoning about compiler optimizations,
is sometimes useful to play such games. The C2 JIT models immutable
fields (like `oop._klass` and `arrayOop._length`) as indefinitely
pre-existent in memory, whether the containing object’s allocation was
recent or not. This model is chosen not because the authors of C2 are
platonists, but because it is the simplest model to use inside the
limited horizon of a compilation task.
(If X had mutable fields, then a timeless mathematical model of its
would require some representation of the varying memory states that X
might have. You need to say things like “X has these field values in
this overall memory state.” It can be done, and in fact optimizers
*must* do this. But today I’m not dealing with mutable fields, nor
with synchronization state, which is also mutable.)
So, it’s possible to think of both 42 and X as existing outside of
time in some platonic mathematical universe. But most people will find
it tolerable to think of a platonic 42 but not X.
After all, the way most of us learn about X is by being shown a diagram
of X as a data structure, probably a box with a header field. There
might be an arrow from header to a type metadata entity (maybe labeled
`Object.class`). There is certainly an arrow from any other place that
is referring to X. I call this the “boxes and arrows” presentation
of data structures. Clearly if something is a box, it’s sitting on a
whiteboard or page, or in a warehouse where such boxes live. We are
taught (early on) that such boxes sit in “memory” or in “the
heap”.
In search of consistency between 42 and X, we can go to the other
extreme, and require that all entities (in software) are confined to
real, existing, physical boxes of computer stuff. This is easy for X,
and not hard for 42 either. You simply say that 42 exists wherever its
bits have been computed and stored (in a variable or a
`CONSTANT_Integer` structure). Then, you agree that there is a way to
detect the equality of any two summonings of 42 are in fact both 42 (or
to tell that they differ). And you should also agree to talk of “some
42 somewhere”, not “the value 42”. At most you can say “some 42
somewhere which will be detectably equal to the 42 I am working with
right now”. I think most people fill find this to be a mind game as
well; they are platonists for 42 but not for our friend X.
As a historical note, there is are schools of mathematical thought
called “constructivist”, predating the era of computing, that
bravely reject the taint of platonism. Such viewpoints take that view
that every mathematical (and hence computational) object and reasoning
is a matter of construction, with a finite sequence of steps. Without
being an expert, I would guess that constructivists might prefer the
extreme account of 42: It’s a formal pattern which doesn’t exist
until someone constructs it; it is constructed many times; constructed
versions are distinct but can be proven equal; such equality proofs are
again constructive in nature. How many versions of 42 can dance on the
head of a constructivist? Maybe many, but most people would say no more
than one.
If we decide that 42, like X, is a merely bit pattern in the computer
(perhaps replicated many times), then we get a nice, concrete model of
boxes and arrows everywhere. We will need to sidestep an embarrassing
infinite regress when try to draw an arrow to 42 coming from the field
`Integer.value`; it’s doable without arrows thankfully, when you just
write the label “42” in the box. So in a graph of boxes and arrows,
such labels are usually necessary also. This is why Java has
primitives, and a distinction between `Integer` (which is a box) and
`int` (which is the label in one of the fields of that box).
Given such a distinction between labels and arrows, one might think that
the Valhalla goal unifying `int` and `Integer` is impossible. But this
leads me to a new thought, which is (a) put everything on heap, and (b)
distinguish sharply between the value heap and the identity heap.
So, although I prefer a platonic viewpoint in most cases, here’s a
non-platonic, constructive, boxes-and-arrows viewpoint that one might
prefer for teaching and visualization.
There is a value heap and an identity heap in the Valhalla VM. Every
identity object lives as a box on the identity heap, and conversely
every value object lives as a box on the value heap. All non-reference
values are visualized as “value arrows” into the value heap. All
non-null reference values are visualized as “identity arrows” into
the identity heap.
For every non-abstract class, instances are allocated on one heap or the
other; no class is allocated on both heaps. Instances of `String` and
`Integer` are allocated on the identity and value heaps, respectively.
Our friend X is visualized as an identity arrow into the identity heap.
He has a header and no other fields. (We might give him a secret field
to hold his identity, but this model does not require it, unlike the
platonic model.)
The value 42 is visualized as a value arrow into the value heap, to a
box labeled 42. The box also has a header, which says
“Integer/int”. Its backward-compatible `value` field points to
itself. The label is not the same as the `value` field; it is part of
the header I guess.
As a clever optimization, compiled code might dispense with the arrow
and just hoist the bit pattern of the label into a register. This is
what we mean when we say that value types are monomorphic: They can be
manipulated in terms of the characteristic labels, instead of their
arrows. Nevertheless, the concept of value arrow comes first, and only
the performance model or JIT-writer’s manual mentions the possibility
of unboxed labels.
A variable of type `Object` is visualized as the root of an arrow (into
either heap), or else the special label or pseudo-arrow `null` which is
not an arrow into either heap.
Other than `null`, primitive value labels (like 42) can exist only
inside the value heap. In fact, they properly exist only inside the
primitive objects themselves. Everything else is arrows (or `null`).
Normally identity and value arrows look and feel the same. Field access
works the same for both. (That is why `Integer.value` loops back to
`this`.) But they differ when the `==` (`acmp`) operator looks at them.
Value arrows are proven equal or unequal using a field-wise recursive
descent. This descent bottoms out at primitives (consulting their
labels) or at `null` or an identity arrow. Identity arrows appeal
solely to the identity of the object in the identity heap. Of course
`null` is equal only to itself.
As noted above, new identity objects are created only by the `new`
bytecode. New value objects are created by `aconst_init` or `withfield`
or by any bytecode which produces a primitive value!
For example, incrementing 42 (maybe with `iinc`) produces a new value
arrow to the value heap, which just so happens to contain an object
labeled 43. How did we get so lucky? Perhaps Plato is smiling on us,
and it has always been there. (This is basically what Java mandates
today for auto-boxing, for small-enough values!) More constructively,
the VM ensures that, if 42 must be incremented, it either finds a
previously created copy of 43, or makes a new copy on the fly. The VM
can flip a coin in real-time and do either, because it is allowed to
make many copies of 43 in the value heap. This is OK because there is
nothing the user can do to distinguish such multiple copies, just as
there is nothing the user can do to tell if the GC has moved an object
in the identity heap.
What goes for `iinc` goes for all the other value-producing bytecodes,
whether primitive or not. The VM is always free to recycle a previously
existing object in the value heap, if it can find one, or to make a new
box.
All of this is a visualization exercise. One might prefer, after all,
to visualize a single heap with two kinds of objects in it. In any
case, a less platonic, more constructive visualization can be obtained
by insisting that all objects, including all values even primitives, are
uniformly accessed via arrows.
I guess my point is that, if we are willing to pretend that arrows are
everywhere, we need not worry whether something is “really” an
`Integer` or “really” an `int`.
I haven’t said anything yet about flattening. That requires
additional work to visualize the container property of the arrows in the
heap. There are at least two ways to do it: Color the value arrows
that are to be flattened, and just nest one box inside the other. I
think I would start with colored arrows, explaining that the VM is being
invited to flatten, and then show nested boxes. But the nested boxes
violate the “everything is an arrow” symmetry of the visualization,
so they are a sort of commentary.
Again, as commentary on the VM’s likely optimization, you might
“hoist” 42 onto the stack or into a field by erasing the arrow (to a
copy of 42 in the value heap) and writing the label “42” in its
place. And you might do this in heap fields as well. Turning back to
2D points, you might “hoist” `Point(42,42)` into a stack variable or
heap field by erasing the arrow and replacing it with a nested box (for
the point). In either case (unboxed 42 or `Point(42,42)`), there is a
caveat that when you use that value, you are “really” using an arrow
into the value heap.
So, FWIW, and in the spirit of brainstorming, that is one way to (a)
make values and objects more like each other, while (b) staying
relentlessly constructive, and (c) avoiding the question of whether an
object is an `Integer` or an `int`.
The distinction of `Integer` vs. `int`, and also `Point` vs.
`Point.val`, is therefore a matter of viewpoint and not essence.
Everything is an object, such as `Integer` or `Point`. (Except `null`.)
The `.val`/`.ref` distinction, like other non-value-set distinctions
(`final` vs. non-`final`, or `Object` vs. a narrower type), is a way for
the programmer to annotate the program to express a richer view of the
programmer’s reasonings about the program, and to unlock
optimizations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220810/40565541/attachment-0001.htm>
More information about the valhalla-spec-observers
mailing list