Change 6420645 causes SIGBUS during deoptimisation at a safepoint on 64bit-SPARC

Mon Aug 31 15:35:26 PDT 2009

Hi,

The code that is causing this crash was changed during development of 
compressed oops and I erroneously thought it was equivalent to the 
original code, and I found it easier to understand, so I checked it in.
So no, there was no rationale behind this change to make compressed oops 
work. It should be backed out (and your test added to our test system).  
I will file a new bug if there isn't one already.
As far as the strange encoding of floating point registers on sparc, I 
don't know the rationale behind it.  Maybe someone on the compiler 
mailing list can answer that.

Thank you for finding this.
Coleen

Volker Simonis wrote:
> Hi Coleen,
>
> I discovered a problem during deoptimisation at a safepoint which
> leads to a SIGBUS on 64bit-SPARC. The problem was introduced by the
> change "6420645: Create a vm that uses compressed oops for up to 32gb
> heapsizes" which has been submitted by you. The problem is easily
> reproducible with the attached test program. Just run:
>
> java -d64 -server -showversion -Xcomp -Xbatch
> "-XX:CompileCommand=compileonly DeoptTest
> deopt_compiledframe_at_safepoint" -XX:+PrintCompilation DeoptTest
>
> and you will get a VM crash like:
>
> CompilerOracle: compileonly DeoptTest.deopt_compiledframe_at_safepoint
> java version "1.6.0_14"
> Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, compiled mode)
>
>   1   b   DeoptTest::deopt_compiledframe_at_safepoint (220 bytes)
>   1   made not entrant  DeoptTest::deopt_compiledframe_at_safepoint (220 bytes)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0xffffffff7e37514c, pid=9314, tid=15
> #
> # JRE version: 6.0_14-b08
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (14.0-b16 compiled mode
> solaris-sparc )
> # Problematic frame:
> # V  [libjvm.so+0x77514c]
>
> As noticed before, the error is a regression of change 6420645. It
> doesn't happen with earlier versions of the HotSpot. For example 6u13
> with HS 11 runs the test just fine:
>
> CompilerOracle: compileonly DeoptTest.deopt_compiledframe_at_safepoint
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, compiled mode)
>
>   1   b   DeoptTest::deopt_compiledframe_at_safepoint (220 bytes)
>   1   made not entrant  DeoptTest::deopt_compiledframe_at_safepoint (220 bytes)
> OK
>
>
> Notice that the problem is still present in the HS head revsion. I've
> tried with 7-ea b70:
>
> java version "1.7.0-ea"
> Java(TM) SE Runtime Environment (build 1.7.0-ea-b70)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b07, compiled mode)
>
>   1   b   DeoptTest::deopt_compiledframe_at_safepoint (220 bytes)
>   1   made not entrant  DeoptTest::deopt_compiledframe_at_safepoint (220 bytes)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0xffffffff7e38c7fc, pid=10714, tid=15
> #
> # JRE version: 7.0-b70
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b07 compiled mode
> solaris-sparc )
> # Problematic frame:
> # V  [libjvm.so+0x78c7fc]
>
>
> The problem is caused be the following changes in frame.cpp:
>
>
> --- a/src/share/vm/runtime/frame.cpp Sat Dec 01 00:00:00 2007 +0000
> +++ b/src/share/vm/runtime/frame.cpp Sun Apr 13 17:43:42 2008 -0400
> @@ -1153,9 +1153,8 @@ oop* frame::oopmapreg_to_location(VMReg
>      // If it is passed in a register, it got spilled in the stub frame.
>      return (oop *)reg_map->location(reg);
>    } else {
> -    int sp_offset_in_stack_slots = reg->reg2stack();
> -    int sp_offset = sp_offset_in_stack_slots >> (LogBytesPerWord -
> LogBytesPerInt);
> -    return (oop *)&unextended_sp()[sp_offset];
> +    int sp_offset_in_bytes = reg->reg2stack() * VMRegImpl::stack_slot_size;
> +    return (oop*)(((address)unextended_sp()) + sp_offset_in_bytes);
>    }
>  }
>
> and the fact that reg->reg2stack() returns odd values for float
> registers >= F32. This finally leads to a BUS error due to an
> unaligned double read when the location of the register is accessed
> through the reg_map during deoptimisation in
> StackValue::create_stack_value(). In the old implementation, this was
> hidden by the right shift in frame::oopmapreg_to_location() which
> mapped F32 and F33 to the same stack offset.
>
> The problem can be easily solved by switching back to the old
> implementation of frame::oopmapreg_to_location(), but I assume there
> was a rational behind the change and that the new implementation is
> probably necessary for compressed oops (at least that's what the whole
> change was all about). So I dug a little further and found that in my
> opinion the root cause of the whole problem is the strange numbering
> of the 16 upper double registers in sparc.ad. They are defined as
> follows:
>
> reg_def R_D32x(SOC, SOC, Op_RegD,255, F32->as_VMReg());
> reg_def R_D32 (SOC, SOC, Op_RegD,  1, F32->as_VMReg()->next());
> reg_def R_D34x(SOC, SOC, Op_RegD,255, F34->as_VMReg());
> reg_def R_D34 (SOC, SOC, Op_RegD,  3, F34->as_VMReg()->next());
> ...
> reg_def R_D62x(SOC, SOC, Op_RegD,255, F62->as_VMReg());
> reg_def R_D62 (SOC, SOC, Op_RegD, 31, F62->as_VMReg()->next());
>
> This maps the invalid half (R_D32x, R_D34x, ..) of the double
> registers F32-F62 to even VMReg numbers (96, 98, ..) and the valid
> part (R_D32, R_D34, ..) to odd VMReg numbers (97, 99, ..). Later on,
> when the locals array for the safepoint is constructed in
> Compile::FillLocArray(), the call to OptoReg::as_VMReg(regnum) for a
> valid, even double register >= F32 (e.g. 96) returns the invalid, odd
> part (e.g. 97). This odd VMReg number is than stored in the Location
> part of the local and leads to the undesired behaviour in the new
> implementation of frame::oopmapreg_to_location() as described before.
>
> I don't know why this strange encoding has been chosen for the 16
> upper double registers in sparc.ad, but changing it to:
>
> reg_def R_D32x(SOC, SOC, Op_RegD,255, F32->as_VMReg()->next());
> reg_def R_D32 (SOC, SOC, Op_RegD,  1, F32->as_VMReg());
> reg_def R_D34x(SOC, SOC, Op_RegD,255, F34->as_VMReg()->next());
> reg_def R_D34 (SOC, SOC, Op_RegD,  3, F34->as_VMReg());
> ...
> reg_def R_D62x(SOC, SOC, Op_RegD,255, F62->as_VMReg()->next());
> reg_def R_D62 (SOC, SOC, Op_RegD, 31, F62->as_VMReg());
>
> which seems more natural to me, solved the SIGBUS issue and didn't
> revealed any other problems in the tests which I run so far.
>
> Could you please comment on the proposed solution of changing the
> VMReg numbering of F32-F62 or advice a better solution if you think
> that the proposed one will not work in the general case?
>
> Thank you and best regards,
> Volker
>
> PS: while I was hunting the error, I also stumbled across the
> following code in RegisterSaver::save_live_registers():
>
>   // Save all the FP registers
>   int offset = d00_offset;
>   for( int i=0; i<64; i+=2 ) {
>     FloatRegister f = as_FloatRegister(i);
>     __ stf(FloatRegisterImpl::D,  f, SP, offset+STACK_BIAS);
>     map->set_callee_saved(VMRegImpl::stack2reg(offset>>2), f->as_VMReg());
>     if (true) {
>       map->set_callee_saved(VMRegImpl::stack2reg((offset +
> sizeof(float))>>2), f->as_VMReg()->next());
>     }
>     offset += sizeof(double);
>   }
>
> In my opinion, this could be changed to:
>
>   // Save all the FP registers
>   int offset = d00_offset;
>   for( int i=0; i<64; i+=2 ) {
>     FloatRegister f = as_FloatRegister(i);
>     __ stf(FloatRegisterImpl::D,  f, SP, offset+STACK_BIAS);
>     map->set_callee_saved(VMRegImpl::stack2reg(offset>>2), f->as_VMReg());
>     if (i < 32) { // VS 2009-08-31: the 16 upper double registers
> can't be used as floats anyway
>       map->set_callee_saved(VMRegImpl::stack2reg((offset +
> sizeof(float))>>2), f->as_VMReg()->next());
>     }
>     offset += sizeof(double);
>   }
>
> because the 16 upper double registers can't be used as floats anyway.
> Again, I didn't found any regression in my few tests. What do you
> think?
>