<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body><div style="font-family: sans-serif;"><div class="markdown" style="white-space: normal;">
<p dir="auto">Are values like 42 objects like some <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">new Object()</code> (call it X) is an object? We are (mostly) saying “yes” because that lets us make new kinds of primitives which code like objects (classes) but act like values (primitives). Also, unifying concepts (where they <em>can</em> be unified) will probably produce a user model that is easier to use.</p>
<p dir="auto">But this viewpoint (everything is an object) has a downside, because objects like X seem “thick and chunky” while 42 seems to be different, maybe “light and airy”. The following is a rambling exploration of why those seemings might differ, and how perhaps to realign them.</p>
<p dir="auto">I tend to think of Valhalla value objects and primitives in traditional mathematical terms: The entity 42 exists independently of any context, and can be summoned as needed, many times if needed, in a given Java application. (As Brian points out, that’s what <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">CONSTANT_Integer</code> is for, up to 32 bits.) And the same goes for the 2D point value <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">new Point(42,42)</code>. There are surely integers (bigger than 32 bits) which nobody is currently thinking of and no application is currently processing, but any one of them can be summoned at a moment’s notice when an application needs it. The same reasoning applies to structures built up from integers like 2D points.</p>
<p dir="auto">(Summoning an integer or a 2D point in software requires a mechanical procedure to derive its bits. You can’t say in a real program, “the least crypto-key not yet used anywhere on this planet”, even though in some sense that bit pattern is well-defined. For fun discussions of numbers which are at the far edge of the thinkable, see sites like <a href="http://jdh.hamkins.org/largest-number-contest/" style="color: #3983C4;">http://jdh.hamkins.org/largest-number-contest/</a> and <a href="https://medium.com/@joshkerr/who-can-name-the-biggest-number-contest-a2211d21be09" style="color: #3983C4;">https://medium.com/@joshkerr/who-can-name-the-biggest-number-contest-a2211d21be09</a> . Today I was reminded that any given volume of physical space can only represent a limited range of integers, in any scheme. As an amusing corollary, if I were able briefly to wrap my brain around the bits of something really big like Graham’s number, the region of space containing that brain would require cosmic inflation, and/or would collapse into a black hole.)</p>
<p dir="auto">In some sense, the entity X which my program is about to call into existence by executing <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">new Object()</code> could be said to exist independently of any context as well. This entity certainly possesses a hidden identity property that is (presumably) different in all space and time from any other similarly created object. Both the object X and its identity can be viewed as outside of time, eternally pre-existent. And yet, it is hardly ever useful to think of X in these terms. X is clearly something inside a physical box somewhere, and pretending it was eternally pre-existent feels like a mind game. I will say, however, that when writing formal semantics in the language of math, you <em>do</em> play that mind game. And when reasoning about compiler optimizations, is sometimes useful to play such games. The C2 JIT models immutable fields (like <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">oop._klass</code> and <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">arrayOop._length</code>) as indefinitely pre-existent in memory, whether the containing object’s allocation was recent or not. This model is chosen not because the authors of C2 are platonists, but because it is the simplest model to use inside the limited horizon of a compilation task.</p>
<p dir="auto">(If X had mutable fields, then a timeless mathematical model of its would require some representation of the varying memory states that X might have. You need to say things like “X has these field values in this overall memory state.” It can be done, and in fact optimizers <em>must</em> do this. But today I’m not dealing with mutable fields, nor with synchronization state, which is also mutable.)</p>
<p dir="auto">So, it’s possible to think of both 42 and X as existing outside of time in some platonic mathematical universe. But most people will find it tolerable to think of a platonic 42 but not X.</p>
<p dir="auto">After all, the way most of us learn about X is by being shown a diagram of X as a data structure, probably a box with a header field. There might be an arrow from header to a type metadata entity (maybe labeled <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Object.class</code>). There is certainly an arrow from any other place that is referring to X. I call this the “boxes and arrows” presentation of data structures. Clearly if something is a box, it’s sitting on a whiteboard or page, or in a warehouse where such boxes live. We are taught (early on) that such boxes sit in “memory” or in “the heap”.</p>
<p dir="auto">In search of consistency between 42 and X, we can go to the other extreme, and require that all entities (in software) are confined to real, existing, physical boxes of computer stuff. This is easy for X, and not hard for 42 either. You simply say that 42 exists wherever its bits have been computed and stored (in a variable or a <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">CONSTANT_Integer</code> structure). Then, you agree that there is a way to detect the equality of any two summonings of 42 are in fact both 42 (or to tell that they differ). And you should also agree to talk of “some 42 somewhere”, not “the value 42”. At most you can say “some 42 somewhere which will be detectably equal to the 42 I am working with right now”. I think most people fill find this to be a mind game as well; they are platonists for 42 but not for our friend X.</p>
<p dir="auto">As a historical note, there is are schools of mathematical thought called “constructivist”, predating the era of computing, that bravely reject the taint of platonism. Such viewpoints take that view that every mathematical (and hence computational) object and reasoning is a matter of construction, with a finite sequence of steps. Without being an expert, I would guess that constructivists might prefer the extreme account of 42: It’s a formal pattern which doesn’t exist until someone constructs it; it is constructed many times; constructed versions are distinct but can be proven equal; such equality proofs are again constructive in nature. How many versions of 42 can dance on the head of a constructivist? Maybe many, but most people would say no more than one.</p>
<p dir="auto">If we decide that 42, like X, is a merely bit pattern in the computer (perhaps replicated many times), then we get a nice, concrete model of boxes and arrows everywhere. We will need to sidestep an embarrassing infinite regress when try to draw an arrow to 42 coming from the field <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer.value</code>; it’s doable without arrows thankfully, when you just write the label “42” in the box. So in a graph of boxes and arrows, such labels are usually necessary also. This is why Java has primitives, and a distinction between <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer</code> (which is a box) and <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">int</code> (which is the label in one of the fields of that box).</p>
<p dir="auto">Given such a distinction between labels and arrows, one might think that the Valhalla goal unifying <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">int</code> and <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer</code> is impossible. But this leads me to a new thought, which is (a) put everything on heap, and (b) distinguish sharply between the value heap and the identity heap.</p>
<p dir="auto">So, although I prefer a platonic viewpoint in most cases, here’s a non-platonic, constructive, boxes-and-arrows viewpoint that one might prefer for teaching and visualization.</p>
<p dir="auto">There is a value heap and an identity heap in the Valhalla VM. Every identity object lives as a box on the identity heap, and conversely every value object lives as a box on the value heap. All non-reference values are visualized as “value arrows” into the value heap. All non-null reference values are visualized as “identity arrows” into the identity heap.</p>
<p dir="auto">For every non-abstract class, instances are allocated on one heap or the other; no class is allocated on both heaps. Instances of <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">String</code> and <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer</code> are allocated on the identity and value heaps, respectively.</p>
<p dir="auto">Our friend X is visualized as an identity arrow into the identity heap. He has a header and no other fields. (We might give him a secret field to hold his identity, but this model does not require it, unlike the platonic model.)</p>
<p dir="auto">The value 42 is visualized as a value arrow into the value heap, to a box labeled 42. The box also has a header, which says “Integer/int”. Its backward-compatible <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">value</code> field points to itself. The label is not the same as the <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">value</code> field; it is part of the header I guess.</p>
<p dir="auto">As a clever optimization, compiled code might dispense with the arrow and just hoist the bit pattern of the label into a register. This is what we mean when we say that value types are monomorphic: They can be manipulated in terms of the characteristic labels, instead of their arrows. Nevertheless, the concept of value arrow comes first, and only the performance model or JIT-writer’s manual mentions the possibility of unboxed labels.</p>
<p dir="auto">A variable of type <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Object</code> is visualized as the root of an arrow (into either heap), or else the special label or pseudo-arrow <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">null</code> which is not an arrow into either heap.</p>
<p dir="auto">Other than <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">null</code>, primitive value labels (like 42) can exist only inside the value heap. In fact, they properly exist only inside the primitive objects themselves. Everything else is arrows (or <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">null</code>).</p>
<p dir="auto">Normally identity and value arrows look and feel the same. Field access works the same for both. (That is why <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer.value</code> loops back to <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">this</code>.) But they differ when the <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">==</code> (<code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">acmp</code>) operator looks at them. Value arrows are proven equal or unequal using a field-wise recursive descent. This descent bottoms out at primitives (consulting their labels) or at <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">null</code> or an identity arrow. Identity arrows appeal solely to the identity of the object in the identity heap. Of course <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">null</code> is equal only to itself.</p>
<p dir="auto">As noted above, new identity objects are created only by the <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">new</code> bytecode. New value objects are created by <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">aconst_init</code> or <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">withfield</code> or by any bytecode which produces a primitive value!</p>
<p dir="auto">For example, incrementing 42 (maybe with <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">iinc</code>) produces a new value arrow to the value heap, which just so happens to contain an object labeled 43. How did we get so lucky? Perhaps Plato is smiling on us, and it has always been there. (This is basically what Java mandates today for auto-boxing, for small-enough values!) More constructively, the VM ensures that, if 42 must be incremented, it either finds a previously created copy of 43, or makes a new copy on the fly. The VM can flip a coin in real-time and do either, because it is allowed to make many copies of 43 in the value heap. This is OK because there is nothing the user can do to distinguish such multiple copies, just as there is nothing the user can do to tell if the GC has moved an object in the identity heap.</p>
<p dir="auto">What goes for <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">iinc</code> goes for all the other value-producing bytecodes, whether primitive or not. The VM is always free to recycle a previously existing object in the value heap, if it can find one, or to make a new box.</p>
<p dir="auto">All of this is a visualization exercise. One might prefer, after all, to visualize a single heap with two kinds of objects in it. In any case, a less platonic, more constructive visualization can be obtained by insisting that all objects, including all values even primitives, are uniformly accessed via arrows.</p>
<p dir="auto">I guess my point is that, if we are willing to pretend that arrows are everywhere, we need not worry whether something is “really” an <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer</code> or “really” an <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">int</code>.</p>
<p dir="auto">I haven’t said anything yet about flattening. That requires additional work to visualize the container property of the arrows in the heap. There are at least two ways to do it: Color the value arrows that are to be flattened, and just nest one box inside the other. I think I would start with colored arrows, explaining that the VM is being invited to flatten, and then show nested boxes. But the nested boxes violate the “everything is an arrow” symmetry of the visualization, so they are a sort of commentary.</p>
<p dir="auto">Again, as commentary on the VM’s likely optimization, you might “hoist” 42 onto the stack or into a field by erasing the arrow (to a copy of 42 in the value heap) and writing the label “42” in its place. And you might do this in heap fields as well. Turning back to 2D points, you might “hoist” <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Point(42,42)</code> into a stack variable or heap field by erasing the arrow and replacing it with a nested box (for the point). In either case (unboxed 42 or <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Point(42,42)</code>), there is a caveat that when you use that value, you are “really” using an arrow into the value heap.</p>
<p dir="auto">So, FWIW, and in the spirit of brainstorming, that is one way to (a) make values and objects more like each other, while (b) staying relentlessly constructive, and (c) avoiding the question of whether an object is an <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer</code> or an <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">int</code>.</p>
<p dir="auto">The distinction of <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer</code> vs. <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">int</code>, and also <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Point</code> vs. <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Point.val</code>, is therefore a matter of viewpoint and not essence. Everything is an object, such as <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Integer</code> or <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Point</code>. (Except <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">null</code>.) The <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">.val</code>/<code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">.ref</code> distinction, like other non-value-set distinctions (<code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">final</code> vs. non-<code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">final</code>, or <code style="margin: 0; padding: 0 0.4em; border-radius: 3px; background-color: #F7F7F7;">Object</code> vs. a narrower type), is a way for the programmer to annotate the program to express a richer view of the programmer’s reasonings about the program, and to unlock optimizations.</p>
</div></div></body>
</html>