hg: mlvm/mlvm/hotspot: value-obj: first cut

Thu Oct 18 15:36:17 PDT 2012

On 10/18/2012 03:20 PM, David Chase wrote:
> Note: I did go back through the archives for a few months to see if I had missed some earlier discussion of this.  Is there an even earlier discussion that I missed?

John has written a long blog post about value type this year,
   https://blogs.oracle.com/jrose/entry/value_types_in_the_vm
and the Array 2.0 persentation at the summit
http://www.oracle.com/technetwork/java/javase/community/jvmls2012-1840099.html

otherwise, some ideas float around since a long time :)

>
> On 2012-10-18, at 3:21 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>> You can't forget backward compatibility. In Java 9, a program that was
>> written for Java 1.0
>> should still work. This means either you have 2 different classes of
>> Integer (one value type and of boxed type)
>> or you are in mixed mode and you allow Integers to be flagged as value
>> type or not.
> Can we work with an example program to make this more concrete?
> "Failure" is losing track of object identity; the unbox/box game for legacy/modern compatibility only goes wrong when an important identity is dropped on the floor.  If there's no ==true relation, then reboxing just costs time.
>
> If I imagine the version-specific-compilation game, refInteger only ceases to be refInteger within the internals of modern Library code (or in the case that an application is not recompiled, but some other library on which it depends is), and then only when it is not statically typed as Object or Number (reference supertypes).  So old code retains its semantics, exactly, and library retains its semantics in those cases where value types are referred to as Objects.  I'm assuming that we get to efficient, value-handling generics when we get to reified generics.
>
> Offhand, I don't know of any library code that manipulates Integers as Integers and makes any sort of promises about their reference identity, except for the nonsense about small-value interns (which we probably replicate, because it is easy to do so).  ArrayLists and HashMaps store them in Object-typed containers, so they'll retain their reference identity there. (But there's a lot of library code, and I don't know it now as well as I used to).

IdentityHashMap<Integer, X>,
https://duckduckgo.com/?q=IdentityHashMap%3CInteger

>
> The more-likely screw cases I think would involve String, in particular those cases where a library method promises to return an interned String, or other library code written by 3rd parties that does who-knows-what?

I am not sure String is a good candidate to be seen as a value type. 
String array can be big, what you want for String is jsut colocation of 
the String object and the array of chars object.
This can be done with a Maxine like hybrid object.

>
>> But if you want the extra ball (as John
>> said) you need a way to
>> flag which Integer is a value type or not, hence a tag bit in the object
>> header.
> And I think this ends up being a runtime-static property, because value types have a completely different representation -- wider or narrower than a pointer, and no object header.  You must at least have two entrypoints; the old code is expecting (for behavioral reasons) pointer semantics, the new code is expecting (for performance reasons) value semantics.  (Remarks about cache inefficiency seem distracting until we figure out if we like the semantics, unless our choices send performance completely into the toilet.)  The interpreter I assume acts like it is "legacy" code.

I'm not sure you need two entry points in all cases, you can also deopt 
if you have to be compatible with the boxing semantics,
a value object is just a way to say I don't care about the identity, so 
the JIT may optimize.

>
> If we use an instance-flag instead, don't you end up in the same boat with any Integer resulting from a call to an Integer-allocation method in the (modern implementation) library?  Integer.valueOf returns a value-tagged Integer, because the "new" occurs in code that will be recompiled into the modern world -- what if the result is used in a lock, in old code?

If a user uses valueOf actually, he has no way to control the identity 
of the resulting object, the JLS says that values between -128 and 127 
must be boxed
but doesn't say that value greater than 127 can not be boxed (the 
OpenJDK implementation allow you to change the upper value on the 
command line, BTW).
So you can safely rebox the object if it is used in a lock, the 
semantics will be as broken as the original semantics.

> Or what if that is re-passed in to modern-compiled code?
> (I'm trying to come up with an example, I think I have to use String instead.)

it will be unboxed at the frontier.

>
> Here's an example -- String.toString(), the specification (at least, Java 6, which is handy in my browser) promises that this is implemented with "return self;".  This is a screw case for either strategy, naively implemented:
>
> // This is "old" code, not recompiled, calling "new" code from the library.
>    String cat0 = "cat"; // a legacy string.
>    String cat1 =  cat0.substring(0); // returns a modern-allocated "new" string.
>    String cat2 = cat0.toString(); // "same string"
>    String cat3 = cat1.toString(); // "same string"
>    assert (cat0 == cat2) // works, tagged instances; fails, tagged code
>    assert (cat1 == cat3) // fails, tagged code or tagged instance.
>
> or, instead of "==", two threads could be dispatched, using cat1 and cat3 for their respective locks to coordinate execution (yes, I know the author of such code should be shot).
>
> To avoid the second fail, either we special-case the implementation of toString, or we both tag the code and tag the instances created within it.
>
> Another way to put this is that legacy code will expect to see pointer identity observed, even if the original source of the pointer was in modern code.  The modern code won't care, but if the legacy code ever observes the pointer, it will expect it to behave "like a pointer".  That's why I'm skeptical about an instance tag that depends on allocation site, and why I think that code identity is what matters more.

I disagree, at least for wrapper, if the code uses valueOf(), it means 
you don't care about the identity.
Given that because of the overriding you can have a mix of old code and 
new code in the very same method (with inlining),
I don't think that the version of the code is something useful here. And 
as I said earlier, it will not be backward compatible,
i.e. old code compiled with the new version will behave differently.

>
> Ugh, a somewhat more annoying question, inspired by trying to find an example.
> What happens if we have a "volatile Complex" (using either strategy)?
> Assume, for the sake of amusement, that Complex is implemented with a pair of doubles, hence is 128 bits at minimum in its value representation.  (Possible implementation -- as a value if there's a native CAS of the right size, otherwise as a reference.)

good question, you can hope that the current CPU understands an 
instruction like CMPXCHG16 or the JIT will not be able to unbox the 
Complex value, I suppose.

>
> Another screw case is array-of-Complex.  Do we have a single representation for such a type, or two?  If it is just one, then if it uses refs, then it loses performance in a big way, if it uses values, it loses object identity in a big way.

As you said earlier, you can switch from one representation to another 
but I admit that this part is fuzzy for me.

> It seems like this has to wait on either reified generics, or a different, new "array" type.

Note that you can already use the tagged array interface of Jim Laskey 
for that.

>
>>> - we can use different compilation strategies for code depending on its bytecode version number.
>> No, you can't.
> Is this a "no, derived from compatibility arguments not sufficiently explained here", or a "no, that is architecturally impossible given the current system"?

No becauseit will break backward compatibility exactly like new 
ArrayList doesn't infer it's type even if java version is greater than 5.

>
> I imagine this is a flag not unlike strictfp, so I assume this derives from compatibility arguments, and I imagine that the interpreter does the conservative and slow thing, which is to preserve reference identity for all the code that it runs.

It seems easier and the JIT will generate an optimized code depending on 
the profile.

>
> David

Rémi