RFR 8156073 : 2-slot LiveStackFrame locals (long and double) are incorrect

Thu Aug 25 02:17:23 UTC 2016

John,

This is useful, thanks.  Probably more questions will follow after doing more homework. 

Mandy

> On Aug 24, 2016, at 10:07 AM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Aug 22, 2016, at 9:30 PM, Mandy Chung <mandy.chung at oracle.com> wrote:
>> 
>> We need to follow up this issue to understand what the interpreter and compiler do for this unused slot and whether it’s always zero out.
> 
> These slot pairs are a curse, in the same league as endian-ness.
> 
> Suppose a 64-bit long x lives in L[0] and L[1].   Now suppose
> that the interpreter (as well it might) has adjacent 32-bit words
> for those locals.  There are four reasonable conventions for
> apportioning the bits of x into L[0:1].  Call HI(x) the arithmetically
> high part of x, and LO(x) the other part.  Also, call FST(x) the
> lower-addressed 32-bit component of x, when stored in memory,
> and SND(x) the other part.  Depending on your machine's
> endian-ness, HI=FST or HI=SND (little-endian, x86).
> For portable code there are obviously four ways to pack L[0:1].
> I've personally seen them all, sometimes as hard-to-find VM bugs.
> 
> We're just getting started, though.  Now let the interpreter generously
> allocate 64 bits to each local.  The above four cases are still possible,
> but now we have 4 32-bit storage units to play with.  That makes
> (if you do the math) 4x3=12 more theoretically possible ways to
> store the bits of x into the 128 bits of L[0:1].  I've not seen all 12,
> but there are several variations that HotSpot has used over time.
> 
> Confused yet?  There's more:  All current HotSpot implementations
> grow the stack downward, which means that the address of L[0]
> is *higher* than L[1].  This means that the pair of storage units
> for L[0:1] can be viewed as a memory buffer, but the bits of L[1]
> come at a lower address.  (Once we had a tagged-stack interpreter
> in which there were extra tag words between the words of L[0]
> and L[1], for extra fun.  We got tired of that.)
> 
> There's one more annoyance:  The memory block located at L[0:1]
> must be at least 64 bits wide, but it need not be 64-bit aligned,
> if the size of a local slot is 32 bits.  So on machines that cannot
> perform unaligned 64-bit access, the interpreter needs to load
> and store 64-bit values as 32-bit halves.  But we can put that
> aside for now; that's a separable cost borne by 32-bit RISCs.
> 
> How do we simplify this?  For one thing, view all reference
> to HI and LO with extreme suspicion.  That goes for misleadingly
> simple terms like "the low half of x".  On Intel everybody
> knows that's also FST (the first memory word of x), and
> nods in agreement, and then when you port to SPARC
> (that was my job) the nods turn into glassy-eyed stares.
> 
> Next, don't trust L[0] and L[1] to work like array elements.
> Although the bytecode interpreter refers directly to L[0]
> and indirectly to L[1], when storing 'x', realize that you
> don't know exactly how those guys are laid out in memory.
> The interpreter will make some local decision to avoid
> the obvious-in-retrospect bug of storing 64-bits to L[0]
> on a 32-bit machine.  The decision might be to form the
> address of L[1] and treat *that* as the base address of
> a memory block.  The more subtle and principled thing
> to do would be to form the address of the *end* of L[0]
> and treat that as the *end* address of a memory block.
> The two approaches are equivalent on 32-bit machine,
> but on a 64-bit machine one puts the payload only
> in L[1] and one only in L[0].
> 
> Meanwhile, the JIT, with its free-wheeling approach
> to storage allocation, will probably try its best to ignore
> and forget stupid L[1], allocating a right-sized register
> or stack slot for L[0].
> 
> Thus interpreter and JIT can have independent internal
> conventions for how they assign storage units to L[0:1] and
> how they use those units to store a 64-bit value.  Those
> independent schemes have to be reconciled along mode
> change paths:  C2I and I2C adapters, deoptimization, and
> on-stack replacement (= reoptimization).
> 
> The vframe_hp code does this.  A strong global convention
> would be best, such as always using L[0] and always storing
> all of x in L[0] if it fits, else SND(x) in L[0] and FST(x) in L[1].
> I'm not sure (and I doubt) that we are actually that clean.
> 
> Any reasonable high-level API for dealing with this stuff
> will do like the JIT does, and pretend that, whatever the
> size of L[0] is physically, it contains the whole value assigned
> to it, without any need to inspect L[1].  That's the best policy
> for virtualizing stack frames, because it aligns with the
> plain meaning of bytecodes like "lload0", which don't mention
> L[1].  The role of L[1] is to provide "slop space" for internal
> storage in a tiny interpreter; it has no external role.  The
> convention used in HotSpot and the JVM verifier is to
> assign a special type to L[1], "Top" which means "do not
> look at me; I contain no bits".  A virtualized API which
> produces a view on such an L[1] needs to return some
> default value (if pressed), and to indicate that the slot
> has no payload.
> 
> HTH
> 
> — John
> 
> P.S.  If all goes well with Valhalla, we will probably get
> rid of slot pairs altogether in a future version of the JVM
> bytecodes.  They spoil generics over longs and doubles.
> The 32-bit implementations of JVM interpreters will have
> to do extra work, such as have 64-bit slot sizes for methods
> that work with longs or doubles, but it's worth it.
>