Problem with HSAIL->interpreter deopt with many variables

Tom Rodriguez tom.rodriguez at oracle.com
Tue Aug 26 21:20:29 UTC 2014


On Aug 26, 2014, at 11:50 AM, Caspole, Eric <Eric.Caspole at amd.com> wrote:

> Hi everybody,
> Is it normal to have the deoptimization of a compiled frame sitting
> right on top of a call_stub frame called from the C++ code? I don't see any comments in
> deoptimization or in the stub generators that mention anything about this.

I think the problem is the disagreement between the number of arguments passed through the JavaCall when invoking the trampoline and the number of arguments in the deopt frame state.  The deopt code is expecting that the frame of the call_stub has already been adjusted to handle the number of initial arguments for the root of the compile but that hasn’t been done so it’s overwriting part of the call stub frame.  This should probably be the responsibility of the call_stub but it would mean that we need to pass some extra info about how much space to pad the frame.  Injecting some extra pushes before pushing the real arguments appears to let your test work, or at least not crash.  Maybe Gilles has some ideas on other ways to fix this?

-XX:+VerifyStack should have complained about this but it’s missing some logic to describe the entry frame so it didn’t detect the overlap.

tom

> 
> Here is an example of the stack when this problem happens during a
> kernel deoptimization:
> 
> 
> - Hsail::execute_kernel_void_1d_internal sp=7ff0680
> 
> - [JavaCall frames]
> 
> - StubRoutines::call_stub sp = 7ff0340
> 
> - hsail.test.lambda.MoreThanEightArgsOOBTest.lambda$innerTest$139 sp =
> 7ff0320
> (This is the compiled "trampoline" for HSAIL deoptimization)
> 
> - UncommonTrapStub.uncommonTrapHandler sp =7ff01a8
> 
> #2  0x00007ffff645d310 in Deoptimization::unpack_frames
> (thread=0x7ffff000d800, exec_mode=2) at
> /home/ecaspole/views/graal-default/graal/src/share/vm/runtime/deoptimization.cpp:611
> 
> #1  0x00007ffff6a99921 in vframeArray::unpack_to_stack
> (this=0x7ffff0954d58, unpack_frame=..., exec_mode=2,
> caller_actual_parameters=9) at
> /home/ecaspole/views/graal-default/graal/src/share/vm/runtime/vframeArray.cpp:598
> 
> #0  vframeArrayElement::unpack_on_stack (this=0x7ffff0955450,
> caller_actual_parameters=9, callee_parameters=0, callee_locals=0,
> caller=0x7ffff7fef780, is_top_frame=true, is_bottom_frame=true,
> exec_mode=2) at
> /home/ecaspole/views/graal-default/graal/src/share/vm/runtime/vframeArray.cpp:328
> 
> top frame $rsp=0x7ffff7fee7e0
> 
> _this->_frame size = 10,
> sender frame size = 283 (this is theStubRoutines::call_stub frame)
> 
> 
> (gdb) p this->_frame
> $5 = {
>   _sp = 0x7ffff7ff02c8,
>   _pc = 0x7fffdc00a7a0 "H\307", <incomplete sequence \360>,
>   _cb = 0x7fffdc005390,
>   _deopt_state = frame::not_deoptimized,
>   static _check_value = {
>     <OopClosure> = {
>       <Closure> = {
>         <StackObj> = {
>           <AllocatedObj> = {
>             _vptr.AllocatedObj = 0x7ffff72e57b0
>           }, <No data fields>},
>         members of Closure:
>         _abort = false
>       }, <No data fields>}, <No data fields>},
>   static _check_oop = {
>     <OopClosure> = {
>       <Closure> = {
>         <StackObj> = {
>           <AllocatedObj> = {
>             _vptr.AllocatedObj = 0x7ffff72e5770
>           }, <No data fields>},
>         members of Closure:
>         _abort = false
>       }, <No data fields>}, <No data fields>},
>   static _zap_dead = {
>     <OopClosure> = {
>       <Closure> = {
>         <StackObj> = {
>           <AllocatedObj> = {
>             _vptr.AllocatedObj = 0x7ffff72e5730
>           }, <No data fields>},
>         members of Closure:
>         _abort = false
>       }, <No data fields>}, <No data fields>},
>   _fp = 0x7ffff7ff0308,
>   _unextended_sp = 0x7ffff7ff02c8
> }
> (gdb)
> 
> 365       for(i = 0; i < locals()->size(); i++) {
> (gdb)
> 366         StackValue *value = locals()->at(i);
> (gdb)
> 367         intptr_t* addr  = iframe()->interpreter_frame_local_at(i);
> (gdb)
> 368         switch(value->type()) {
> (gdb) p addr
> $6 = (intptr_t *) 0x7ffff7ff0380
> 
> So here you can see that addr, where locals will be written, is well
> above the SP of StubRoutines::call_stub (7ff0340), and it overwrites the
> callee-saves saved in the call_stub frame. Depending on how many locals
> get restored here in the wrong place, this may or may not cause a crash
> after returning all the way back to execute_kernel_void_1d_internal.
> 
> I put a better test than before at: http://cr.openjdk.java.net/~ecaspole/MoreThanEightArgsOOBTest.java
> 
> I run it like:  ./mx.sh -V  --vmbuild debug  --vm server unittest  -Xms2g -Xmx2g -XX:+TraceGPUInteraction -XX:+PrintGCDetails  -XX:-UseCompressedOops -Dkerneltester.runOkraFirst=true hsail.test.lambda.MoreThanEightArgsOOBTest
> 
> Thanks for any advice on this,
> Eric
> 
> 
> 
> 
> 
> -------- Original Message --------
> Subject: Problem with HSAIL->interpreter deopt with many variables
> Date: Thu, 21 Aug 2014 22:50:11 +0000
> From: Caspole, Eric <Eric.Caspole at amd.com>
> To: graal-dev at openjdk.java.net <graal-dev at openjdk.java.net>
> 
> I think I found a problem with the HSAIL deoptimization back to
> interpreter when there are a lot of locals in the offloaded lambda. From
> what I have seen so far it looks like if there are more than about 8
> locals, and I am not sure what is the mix of ints and objects, when the
> locals get restored into the new interpreter frame in
> vframeArrayElement::unpack_on_stack(), it writes into the stack frame of
> call_stub() that is used when calling from the hsail C++ code to the x86
> trampoline for the method.
> 
> I put a test case that shows working/crashing just by switching 2 lines
> of code at
>  http://cr.openjdk.java.net/~ecaspole/OtherArgsWithCompSafepointTest.java
> Just switch the lines around at line 47 to see it work or crash.
> In this test to see the crash you have to take a safepoint and
> deoptimize on the compiler safepoint in the loop in the kernel.
> Run it with : ./mx.sh --vmbuild debug --vm server unittest
> -XX:+TraceGPUInteraction  hsail.test.lambda.OtherArgsWithCompSafepointTest
> 
> When the problem happens, it over writes the callee saves in call_stub
> so it ends up crashing in the hsail C++ code or near there.
> I am not sure if this problem has always been there since we have very
> few test cases with this many variables.
> 
> I am not familiar with how the frames are created on a deopt. Could
> someone give me some hints about this? How is the newly created frame
> placed relative to the caller frames? How is the size of that frame
> determined?
> 
> Thanks,
> Eric
> 
> 
> 
> 
> Here is one example crash from this test case -
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007fd250000770, pid=835, tid=140541751068416
> #
> # JRE version: OpenJDK Runtime Environment (8.0) (build
> 1.8.0-internal-ecaspole_2014_06_09_09_40-b00)
> # Java VM: OpenJDK 64-Bit Server VM
> (25.0-b63-internal-graal-0.5-dev-debug mixed mode linux-amd64 )
> # Problematic frame:
> # v  ~StubRoutines::call_stub
> #
> # Core dump written. Default location:
> /home/ecaspole/views/graal-deopt-size/graal/core or core.835
> #
> # An error report file with more information is saved as:
> # /home/ecaspole/views/graal-deopt-size/graal/hs_err_pid835.log
> Loaded disassembler from
> /home/ecaspole/views/graal-deopt-size/graal/jdk1.8.0-internal/debug/jre/lib/amd64/hsdis-amd64.so
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> #
> 
> 
> 



More information about the graal-dev mailing list