where are all the objects?

Brian Goetz brian.goetz at oracle.com
Mon Jul 25 14:05:32 UTC 2022


I had another read through your Values document 
(https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/edit#). 
Let me try to summarize.

Values.  You want to use Values to describe "free floating pieces of 
data."  They don't live any place specific, they have no identity, they 
are immutable.  Every value has a type, but values do not necessarily 
incorporate their own typestate; this may live elsewhere (e.g., field 
descriptors.)

Variables.  Variables can hold values.  Variables have types, which 
determine which values may be written to them and what we can assume 
about values read from them.

Containers.  Variables live in containers (classes, instances, arrays, 
stack frames.)

Kinds of values.  Values are primitives, references to objects, or the 
special reference null.

Objects.  Objects have an independent existence, are self-describing 
(e.g., Object::getClass), may have identity, and can only be interacted 
with through references.


I think this is a valid model of where things are today, though I think 
that some of the "essential characteristics" of Objects in your model 
may be more accidental than you give them credit for.  That is, some of 
these characteristics are of "objects that are the target of an object 
*reference*", which happens to be all the objects today.  Similarly, 
"has its own independent existence" may feel more accidental once 
references are optional.

Of course something will have to change, and we want that change to feel 
natural and not pulling the rug out from under user's mental models.

The change I'm proposing in this model is:

Instead of values being "primitives and object references", values 
become "value objects and object references".  A Complex.val is a 
value.  3+2i still meets all the requirements of values: free floating, 
no canonical location, no identity, immutable.  It's just a "bigger" 
value that we could have before.  Primitives become value objects.  I 
think people can understand (and will like) this story.

Variables and containers are unchanged.

Objects are instances of classes.  Instances of identity classes remain 
dependent on references to interact with them; instances of value 
classes can also be the target of references, *and* are values on their 
own.  (This is not excessively weird, since "Complex" and "reference to 
Complex" is like `int` and `int *` in C.)

Let's take a look at your essential characteristics of objects again.

  - Objects are entities, they have their own independent existence.  I 
think this one is a consequence of "objects only can be interacted with 
through references."  That is, there is a kind of value called 
"reference to object", and the reference refers to ... an object, which 
is a thing separate from the reference.

So, *if* an object is the target of a reference, then yes, it must be an 
entity that is somewhere else, with its independent existence.  "Thing 
that is the target of an object reference" is one reasonable definition 
for "Object", but I don't think it is the only one.  What I'm saying is 
that I think its fair to say an instance of Complex is an object (and 
further, that saying "its an instance, but not an object", is likely to 
be more confusing that beneficial.)  I think the term for what you are 
describing is *referent*, and not all objects are referents.

  - Objects are self-describing.  By this I'll assume you mean 
Object::getClass.  Here, I say that objects remain self-describing under 
the "instances are objects" model, but something interesting happens 
under the hood about *where* the description lives.  If I have a 
`Complex` in a variable of type `Complex.val`, there is sufficient 
information *in the container* to know the class of the instance, so the 
instance doesn't have to carry it with it.  There is an operation for 
"take a reference of" that can be applied to value objects. This 
operation (logically, though this is frequently optimized away) takes 
that information out of the container and puts it in an object header.  
But regardless of whether the typestate is in the container or the 
object itself, objects are self-describing.

  - Only an object can have identity.  Remains true; new thing is that 
not all objects have identity.

  - An object is always accessed by reference.  This is what I'm saying 
changes; value objects are values.

So I think that what you are describing as essential characteristics of 
objects, are really essential characteristics of *referents of object 
references*.  And I would argue that while this is a well-defined 
concept, it's not the most important distinction we want to put in the 
user's face. Instead, we can say that an instance of Complex can be a 
value, or it can be a referent, but its the same Complex either way, and 
the user gets to decide what packaging it wants to put it in.









On 7/22/2022 7:16 PM, Brian Goetz wrote:
>> Now I wonder if these points, at least, might be uncontroversial:
>>
>> 1. There exist useful well-defined concepts of "value" and "object" 
>> that are disjoint and that *have been* valid up to now. (I'll hazard 
>> a claim that my paper still defends at least /this/ much well enough.)
>> 2. Also, you've had to treat the two quite differently from each 
>> other in your programs.
>> 3. We *are* changing (improving) #2 through this project.
>
> I claim we are changing #1 as well, though to a lesser degree.  #2 
> should “mostly go away”; #1 should transform into other terms, such as 
> e.g. “object stored directly” vs “reference to object”.  It is those 
> other terms that I think we are searching for consensus on, but #1 is 
> moving.
>
>> 4. But users may still need #1's disjoint concepts when they are 
>> trying to reason about the *performance* model (tho they'll also need 
>> to understand that the VM is empowered to "fake" one as the other 
>> when the spirit so moves it).
>
> Yes, though I think these are concepts that are more _derived from_ 
> the distinction in #1.  John’s notion of “placement” is good here; the 
> choice of ref vs val constrains the placement, and placement informs 
> the performance model.  I think part of what has been missing until 
> today is a good attempt to name the intermediate actors, like 
> placement.  I hope that if we refine those terms a bit, things will 
> get clearer.
>
>> 5. The questions at hand in this thread are not foremost about the 
>> performance model but about the basic "start-here" user model.
>> 6. These miiight be fair descriptions of the 2 camps?
>>
>>     A. Because you'll get to program mostly the same way in both
>>     cases, we can and should de-emphasize the distinction. There
>>     might be a reference sitting in between you and the data/"object"
>>     or there might not. It's mostly in the VM's hands. If you ever
>>     think you care about the distinction, you probably are dipping
>>     down into the performance model. There is a "just don't worry
>>     about it!" flavor to this option.
>>     B. It's still helpful to have a solid sense of the distinction,
>>     even as we benefit from getting to code the same way to each.
>>     Even though the VM might really fake one as the other; again,
>>     that's performance model.
>>
>>
>> Anything controversial about the above?
>
> No, and I want to choose both A and B!  I don’t think they are 
> opposed, I think they are different angles on the elephant.
>
>> (If I had to explain why I've been so dogged about B, maybe it's the 
>> sense that we simply won't "get away with" A. It feels hard (to me) 
>> to tell users simultaneously that they should stop caring about a 
>> distinction AND that we're changing up how all kinds of stuff works 
>> across that distinction. It feels more solid to firm up the 
>> distinction so that we can talk about how things are changing, and 
>> then let that distinction just slowly matter less and less over time.)
>
> Agree that we need a good "start here” story, but I think a good one 
> will have aspects of A and B.  I think we’re making progress?
>
>>
>>
>> On Fri, Jul 22, 2022 at 12:02 PM John Rose <john.r.rose at oracle.com> 
>> wrote:
>>
>>     On 22 Jul 2022, at 10:55, Brian Goetz wrote:
>>
>>>>
>>             So then, would we call an instance of `Complex.val` a
>>             "non-heap object" or an "inlined object" or what? We need
>>             to flesh out a whole lexicon. The phrase "value object"
>>             becomes useless for this particular distinction as it
>>             will apply to both.
>>
>>         Yes, in the taxonomy I’m pushing, a “value object” is one
>>         without identity, and is the kind of object you can store
>>         directly in variables without going through a reference. But
>>         I don’t think that there are instances of Complex.val and
>>         instances of Complex.ref; I think there are instances of
>>         *Complex*, and multiple ways to describe/store/access them.
>>
>>     FTR, I enthusiastically agree with this viewpoint, even though I
>>     am also probing for weaknesses and alternatives. (FTR I feel the
>>     same about Brian’s summary in his previous short message.)
>>
>>     And under this viewpoint, the terms “instance” and “object” have
>>     the same denotation, though difference connotations. (When I say
>>     “instance” you may well think, “instance of what”? But you don’t
>>     ask that question so much if I say “object”.)
>>
>>                 That `int/Integer` decision you've been making has
>>                 always been between (1) value and (2) (reference-to)
>>                 object, and that decision is still exactly between
>>                 (1) value and (2) (reference-to) object now, and btw
>>                 the definitions of 'reference' and 'object' remain
>>                 precisely wedded to each other as always.
>>
>>             The "heap object" alternative strikes me (and I am trying
>>             to be fair, here) as:
>>
>>                 Now, that's an object either way, and you're going to
>>                 apply that old thought process toward which *kind* of
>>                 object you mean, either a (1) "inline object" or a
>>                 (2) "(reference-to) heap object". It's now just heap
>>                 objects and references that are paired together.
>>
>>     I think, Kevin, you are going wrong at this point: It’s not a
>>     /kind/ of object, it is a /placement/ of an object. What “kind”
>>     of person am I when I am diving to the office? Surely the same
>>     “kind” as when I am at home. But when I am driving, I am equipped
>>     with a car and a road, much like a heap-placed object is equipped
>>     with a header and references.
>>
>>     Likewise, an int/Integer is (in Valhalla) the same “kind” of
>>     object (if we go all the way to making primitives be honorary
>>     objects) whether it is placed in heap or on stack or inside
>>     another object.
>>
>>     The distinction that comes from the choice of equipping an int
>>     with a header in heap storage is a distinction of placement (and
>>     corresponding representation). So an int/Integer does not
>>     intrinsically have a header because it is an object (because of
>>     its “kind”). It /may/ have a header if the JVM needs to give it
>>     one, because it is stuck in the heap.
>>
>>     (My points about int/Integer could partly fail if we fail to
>>     align int and Integer in the end. So transfer the argument to
>>     C.val/C.ref if you prefer. It is the same argument.)
>>
>>     And I would say the /placement/ of an object is in three broad
>>     cases which are worth teaching even to beginners:
>>
>>      *
>>
>>         “in the heap”: therefore referred to by a machine word
>>         address, and presumably equipped with a header and maybe
>>         surrounded by some alignment waste; a JVM might have multiple
>>         heaps but at this level of discourse we say “the heap”
>>
>>      *
>>
>>         “on the stack”: therefore manipulated directly by its
>>         components, which are effectively separated into scalars (it
>>         is “scalarized”, we sometimes say); we might sometimes wish
>>         to say “JVM stack or locals” instead of “stack”, or, with
>>         increasing detail, “on stack, in locals, and/or in registers,
>>         and/or as immediates in the machine code”
>>
>>      *
>>
>>         “contained in another object”: in a field or array element,
>>         therefore piggy-backing on the other object’s placement; and
>>         note that even arrays are scalarized sometimes, lifting their
>>         elements into registers etc.
>>
>>     To summarize: |Placement = Heap | Stack | Contained[Placement]|.
>>
>>     One might use the term “inline” somewhere in there, either to
>>     mean |Contained| or |Stack|Contained[*]|.
>>
>>     Static field values are a special case, but they can be
>>     classified in one of the above ways. HotSpot places static fields
>>     inside a special per-class object (the mirror, in fact), so their
>>     values are either contained or separate in the heap (JVM’s choice
>>     again).
>>
>>     One might be pedantic and say that an instance can be contained
>>     “in static memory” (neither heap nor stack) if the JVM implements
>>     storage for static fields outside of the heap. But in that case
>>     I’d rather say that they are in a funny corner of the heap, where
>>     perhaps headers are not needed, because some static metadata
>>     somewhere dictates what is stored.
>>
>>     (Hence I like to be cagey about whether a heap-object actually
>>     has a physical header. It might not in some JVM implementations.)
>>
>>             Starting to prefer the first way (as I did) did not feel
>>             like going rogue: after all, did we not gravitate toward
>>             ".ref" and ".val" as our placeholder syntaxes, not
>>             ".inline" and ".heap" or anything else?
>>
>>         With you on this. I think asking users to reason about “heap
>>         objects” vs “inline objects” is pushing them towards the
>>         implementation, not the concepts. They may have to reason
>>         about this to understand the performance model, but that’s
>>         already advanced material.
>>
>>     Yes. And even more specifically in the implementation, users who
>>     think about “heap objects” are really (IMO) trying to predict the
>>     /placement/ of the objects, /where/ the JVM will choose to place
>>     their bits in physical memory.
>>
>>     This question of placement is very interesting to the “alert”
>>     performance-minded programmer. Not every programmer is in that
>>     state; for me I try to practice “first make it work then make it
>>     fast”. I get “alert” to performance only in the “make it fast
>>     phase”, a phase which many of my codes never reach.
>>
>>     As a sort of “siren song” the question of placement is /also/
>>     interesting to the beginning student who is struggling to build a
>>     mental image of Java data, and is reaching for visualizations in
>>     terms of memory and addresses, or (what is about the same) boxes
>>     and arrows. But the JVM will make a hash of all that, if it is
>>     doing a good job. So the student must be told to hold those
>>     mental models lightly.
>>
>>     Kevin is insisting (for his own good reasons) on his answer to
>>     “where are the objects”: They are always “in the heap” and thus
>>     “with headers, accessed by pointers”. I suspect (but haven’t seen
>>     from Kevin himself yet) that this is in part due to a desire to
>>     work with, rather than work against, the student’s desire to make
>>     simple visual models of Java data.
>>
>>     Crucially, in a literal “boxes and arrows” model, an arrow
>>     (perhaps a |C.ref| reference to an instance) looks very different
>>     from a nested box (perhaps a |C.val| instance), and the naive
>>     user might insist that such differences are part of the contract
>>     between the user and the JVM. But they are not. The JVM might
>>     introduce invisible “arrows” (because of heap buffering) and it
>>     might remove arrows (because of scalarization for a number of
>>     possible reasons).
>>
>>     So if the student is told that the arrows and boxes are “what’s
>>     really going on” the student using that assurance to predict
>>     performance and footprint will feel cheated in the end.
>>
>>     To summarize: Any given instance/object has logically independent
>>     properties of class and placement.
>>
>>     And thus: The choice of companion type does not affect class but
>>     may (may!) affect placement.
>>
>>     Circling back to the language design, it might seem odd that
>>     there are three ways to place an object but just two companion
>>     types. But this oddness goes away if you realize that |C.val| and
>>     |C.ref| are not placement directives. The choice between the two
>>     is a net-binary selection from a sizeable menu of “affordances”
>>     that the user might be expecting or disavowing at any given point
>>     in the code. (See my lists of “affordances” and “alternative
>>     affordances” in encapsulating-val
>>     <http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html#affordances-of-c.ref>.)
>>
>>     The user is given this simplified switch to influence the JVM’s
>>     decisions about placement (and therefore representation). It is
>>     useful because the JVM can employ different implementation
>>     tactics depending on the differences between the user-visible
>>     contracts of |C.ref| and of |C.val|. In the choice of
>>     implementation tactics, the JVM has the final say.
>>
>>
>>
>> -- 
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-experts/attachments/20220725/bfa7c973/attachment-0001.htm>


More information about the valhalla-spec-experts mailing list