where are all the objects?

John Rose john.r.rose at oracle.com
Fri Jul 22 19:02:23 UTC 2022



On 22 Jul 2022, at 10:55, Brian Goetz wrote:

>
>> So then, would we call an instance of `Complex.val` a "non-heap 
>> object" or an "inlined object" or what? We need to flesh out a whole 
>> lexicon. The phrase "value object" becomes useless for this 
>> particular distinction as it will apply to both.
>
>  Yes, in the taxonomy I’m pushing, a “value object” is one 
> without identity, and is the kind of object you can store directly in 
> variables without going through a reference.  But I don’t think that 
> there are instances of Complex.val and instances of Complex.ref; I 
> think there are instances of *Complex*, and multiple ways to 
> describe/store/access them.

FTR, I enthusiastically agree with this viewpoint, even though I am also 
probing for weaknesses and alternatives.  (FTR I feel the same about 
Brian’s summary in his previous short message.)

And under this viewpoint, the terms “instance” and “object” have 
the same denotation, though difference connotations.  (When I say 
“instance” you may well think, “instance of what”?  But you 
don’t ask that question so much if I say “object”.)

>>> That `int/Integer` decision you've been making has always been 
>>> between (1) value and (2) (reference-to) object, and that decision 
>>> is still exactly between (1) value and (2) (reference-to) object 
>>> now, and btw the definitions of 'reference' and 'object' remain 
>>> precisely wedded to each other as always.
>>
>> The "heap object" alternative strikes me (and I am trying to be fair, 
>> here) as:
>>
>>> Now, that's an object either way, and you're going to apply that old 
>>> thought process toward which *kind* of object you mean, either a (1) 
>>> "inline object" or a (2) "(reference-to) heap object". It's now just 
>>> heap objects and references that are paired together.

I think, Kevin, you are going wrong at this point:  It’s not a *kind* 
of object, it is a *placement* of an object.  What “kind” of person 
am I when I am diving to the office?  Surely the same “kind” as when 
I am at home.  But when I am driving, I am equipped with a car and a 
road, much like a heap-placed object is equipped with a header and 
references.

Likewise, an int/Integer is (in Valhalla) the same “kind” of object 
(if we go all the way to making primitives be honorary objects) whether 
it is placed in heap or on stack or inside another object.

The distinction that comes from the choice of equipping an int with a 
header in heap storage is a distinction of placement (and corresponding 
representation).  So an int/Integer does not intrinsically have a header 
because it is an object (because of its “kind”).  It *may* have a 
header if the JVM needs to give it one, because it is stuck in the heap.

(My points about int/Integer could partly fail if we fail to align int 
and Integer in the end.  So transfer the argument to C.val/C.ref if you 
prefer.  It is the same argument.)

And I would say the *placement* of an object is in three broad cases 
which are worth teaching even to beginners:

  - “in the heap”:  therefore referred to by a machine word address, 
and presumably equipped with a header and maybe surrounded by some 
alignment waste; a JVM might have multiple heaps but at this level of 
discourse we say “the heap”

  - “on the stack”:  therefore manipulated directly by its 
components, which are effectively separated into scalars (it is 
“scalarized”, we sometimes say); we might sometimes wish to say 
“JVM stack or locals” instead of “stack”, or, with increasing 
detail, “on stack, in locals, and/or in registers, and/or as 
immediates in the machine code”

  - “contained in another object”: in a field or array element, 
therefore piggy-backing on the other object’s placement; and note that 
even arrays are scalarized sometimes, lifting their elements into 
registers etc.

To summarize:  `Placement =  Heap | Stack | Contained[Placement]`.

One might use the term “inline” somewhere in there, either to mean 
`Contained` or `Stack|Contained[*]`.

Static field values are a special case, but they can be classified in 
one of the above ways.  HotSpot places static fields inside a special 
per-class object (the mirror, in fact), so their values are either 
contained or separate in the heap (JVM’s choice again).

One might be pedantic and say that an instance can be contained “in 
static memory” (neither heap nor stack) if the JVM implements storage 
for static fields outside of the heap.  But in that case I’d rather 
say that they are in a funny corner of the heap, where perhaps headers 
are not needed, because some static metadata somewhere dictates what is 
stored.

(Hence I like to be cagey about whether a heap-object actually has a 
physical header.  It might not in some JVM implementations.)

>>
>> Starting to prefer the first way (as I did) did not feel like going 
>> rogue: after all, did we not gravitate toward ".ref" and ".val" as 
>> our placeholder syntaxes, not ".inline" and ".heap" or anything else?
>
>  With you on this.  I think asking users to reason about “heap 
> objects” vs “inline objects” is pushing them towards the 
> implementation, not the concepts.  They may have to reason about this 
> to understand the performance model, but that’s already advanced 
> material.

Yes.  And even more specifically in the implementation, users who think 
about “heap objects” are really (IMO) trying to predict the 
*placement* of the objects, *where* the JVM will choose to place their 
bits in physical memory.

This question of placement is very interesting to the “alert” 
performance-minded programmer. Not every programmer is in that state; 
for me I try to practice “first make it work then make it fast”.  I 
get “alert” to performance only in the “make it fast phase”, a 
phase which many of my codes never reach.

As a sort of “siren song” the question of placement is *also* 
interesting to the beginning student who is struggling to build a mental 
image of Java data, and is reaching for visualizations in terms of 
memory and addresses, or (what is about the same) boxes and arrows.  But 
the JVM will make a hash of all that, if it is doing a good job.  So the 
student must be told to hold those mental models lightly.

Kevin is insisting (for his own good reasons) on his answer to “where 
are the objects”:  They are always “in the heap” and thus “with 
headers, accessed by pointers”.  I suspect (but haven’t seen from 
Kevin himself yet) that this is in part due to a desire to work with, 
rather than work against, the student’s desire to make simple visual 
models of Java data.

Crucially, in a literal “boxes and arrows” model, an arrow (perhaps 
a `C.ref` reference to an instance) looks very different from a nested 
box (perhaps a `C.val` instance), and the naive user might insist that 
such differences are part of the contract between the user and the JVM.  
But they are not.  The JVM might introduce invisible “arrows” 
(because of heap buffering) and it might remove arrows (because of 
scalarization for a number of possible reasons).

So if the student is told that the arrows and boxes are “what’s 
really going on” the student using that assurance to predict 
performance and footprint will feel cheated in the end.

To summarize: Any given instance/object has logically independent 
properties of class and placement.

And thus:  The choice of companion type does not affect class but may 
(may!) affect placement.

Circling back to the language design, it might seem odd that there are 
three ways to place an object but just two companion types.  But this 
oddness goes away if you realize that `C.val` and `C.ref` are not 
placement directives.  The choice between the two is a net-binary 
selection from a sizeable menu of “affordances” that the user might 
be expecting or disavowing at any given point in the code.  (See my 
lists of “affordances” and “alternative affordances” in 
[encapsulating-val].)

The user is given this simplified switch to influence the JVM’s 
decisions about placement (and therefore representation).  It is useful 
because the JVM can employ different implementation tactics depending on 
the differences between the user-visible contracts of `C.ref` and of 
`C.val`.  In the choice of implementation tactics, the JVM has the final 
say.

[encapsulating-val]: 
<http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html#affordances-of-c.ref>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220722/8b206449/attachment-0001.htm>


More information about the valhalla-spec-observers mailing list