hg: mlvm/mlvm/hotspot: value-obj: first cut

David Chase david.r.chase at oracle.com
Thu Oct 18 06:20:21 PDT 2012


Note: I did go back through the archives for a few months to see if I had missed some earlier discussion of this.  Is there an even earlier discussion that I missed?

On 2012-10-18, at 3:21 AM, Remi Forax <forax at univ-mlv.fr> wrote:
> You can't forget backward compatibility. In Java 9, a program that was 
> written for Java 1.0
> should still work. This means either you have 2 different classes of 
> Integer (one value type and of boxed type)
> or you are in mixed mode and you allow Integers to be flagged as value 
> type or not.

Can we work with an example program to make this more concrete?
"Failure" is losing track of object identity; the unbox/box game for legacy/modern compatibility only goes wrong when an important identity is dropped on the floor.  If there's no ==true relation, then reboxing just costs time.

If I imagine the version-specific-compilation game, refInteger only ceases to be refInteger within the internals of modern Library code (or in the case that an application is not recompiled, but some other library on which it depends is), and then only when it is not statically typed as Object or Number (reference supertypes).  So old code retains its semantics, exactly, and library retains its semantics in those cases where value types are referred to as Objects.  I'm assuming that we get to efficient, value-handling generics when we get to reified generics.

Offhand, I don't know of any library code that manipulates Integers as Integers and makes any sort of promises about their reference identity, except for the nonsense about small-value interns (which we probably replicate, because it is easy to do so).  ArrayLists and HashMaps store them in Object-typed containers, so they'll retain their reference identity there. (But there's a lot of library code, and I don't know it now as well as I used to).

The more-likely screw cases I think would involve String, in particular those cases where a library method promises to return an interned String, or other library code written by 3rd parties that does who-knows-what?

> But if you want the extra ball (as John 
> said) you need a way to
> flag which Integer is a value type or not, hence a tag bit in the object 
> header.

And I think this ends up being a runtime-static property, because value types have a completely different representation -- wider or narrower than a pointer, and no object header.  You must at least have two entrypoints; the old code is expecting (for behavioral reasons) pointer semantics, the new code is expecting (for performance reasons) value semantics.  (Remarks about cache inefficiency seem distracting until we figure out if we like the semantics, unless our choices send performance completely into the toilet.)  The interpreter I assume acts like it is "legacy" code.

If we use an instance-flag instead, don't you end up in the same boat with any Integer resulting from a call to an Integer-allocation method in the (modern implementation) library?  Integer.valueOf returns a value-tagged Integer, because the "new" occurs in code that will be recompiled into the modern world -- what if the result is used in a lock, in old code?
Or what if that is re-passed in to modern-compiled code?
(I'm trying to come up with an example, I think I have to use String instead.)

Here's an example -- String.toString(), the specification (at least, Java 6, which is handy in my browser) promises that this is implemented with "return self;".  This is a screw case for either strategy, naively implemented:

// This is "old" code, not recompiled, calling "new" code from the library.
  String cat0 = "cat"; // a legacy string.
  String cat1 =  cat0.substring(0); // returns a modern-allocated "new" string.
  String cat2 = cat0.toString(); // "same string"
  String cat3 = cat1.toString(); // "same string"
  assert (cat0 == cat2) // works, tagged instances; fails, tagged code
  assert (cat1 == cat3) // fails, tagged code or tagged instance.

or, instead of "==", two threads could be dispatched, using cat1 and cat3 for their respective locks to coordinate execution (yes, I know the author of such code should be shot).  

To avoid the second fail, either we special-case the implementation of toString, or we both tag the code and tag the instances created within it.

Another way to put this is that legacy code will expect to see pointer identity observed, even if the original source of the pointer was in modern code.  The modern code won't care, but if the legacy code ever observes the pointer, it will expect it to behave "like a pointer".  That's why I'm skeptical about an instance tag that depends on allocation site, and why I think that code identity is what matters more.

Ugh, a somewhat more annoying question, inspired by trying to find an example.
What happens if we have a "volatile Complex" (using either strategy)?
Assume, for the sake of amusement, that Complex is implemented with a pair of doubles, hence is 128 bits at minimum in its value representation.  (Possible implementation -- as a value if there's a native CAS of the right size, otherwise as a reference.)

Another screw case is array-of-Complex.  Do we have a single representation for such a type, or two?  If it is just one, then if it uses refs, then it loses performance in a big way, if it uses values, it loses object identity in a big way.
It seems like this has to wait on either reified generics, or a different, new "array" type.

>> - we can use different compilation strategies for code depending on its bytecode version number.
> 
> No, you can't.

Is this a "no, derived from compatibility arguments not sufficiently explained here", or a "no, that is architecturally impossible given the current system"?

I imagine this is a flag not unlike strictfp, so I assume this derives from compatibility arguments, and I imagine that the interpreter does the conservative and slow thing, which is to preserve reference identity for all the code that it runs.

David



More information about the mlvm-dev mailing list