RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9

Mon Apr 14 13:27:17 UTC 2014

Here is a new webrev that implements Vladimir’s suggestion (use max stack in interpreter frame size computation):

http://cr.openjdk.java.net/~roland/8032410/webrev.04/

The diff from the previous webrev:

http://cr.openjdk.java.net/~roland/8032410/webrev.03-04/

Roland.

On Apr 10, 2014, at 10:26 AM, Roland Westrelin <roland.westrelin at oracle.com> wrote:

> Hi Vladimir,
> 
>>>> I tried to tell about this during review ;) I thought using (<=StackShadowPages) from 8026775 changes should touch it.
>>> 
>>> Sorry Vladimir. I didn’t realize there could be a problem even before the end of the stack is reached. I should have read 8026775 more carefully.
>>> 
>>>> Can you spend sometime and write down in bug report about all places where we do stack bang and how much pages we bang so we can see whole picture?
>>>> 
>>>> I think we should bang all sequential pages and do the same in all places. Banging StackShadowPages or StackShadowPages+1 is secondary if we do the same in all places.
>>> 
>>> Ready for some headaches?
>>> 
>>> The interpreter: allocates the current frame and then stack bangs all pages at sp+1*page … sp+StackShadowPages*page included
>>> The compiler bangs (before my change): sp + StackShadowPages*page and the next frame_size/page_size pages
>>> In the deopt blob we bang: sp (once the compiled frame is popped) +1 page … sp+(StackShadowPages+1)*page and the next frame_size/page_size pages
>>> 
>>> I talked with Mikael and the reason we bang up to (StackShadowPages+1)*page in the deopt blob is because in the interpreter, banging happens once the frame is set up. So banging up to StackShadowPages*page in the deopt blob with no frame pushed doesn’t bang as far as the interpreter would.
>> 
>> So far I am following :)
>> 
>>> 
>>> Let’s take an example with my change (no banging in the deopt blob) and if the compiler bangs at sp+StackShadowPages*page. I think something like this is possible:
>>> Let’s say StackShadowPages=2
>>> 
>>> 1) SP points in page P. We enter an interpreted frame. The frame is allocated. SP is still in P. The interpreter bangs P+1 and P+2.
>> 
>> Do you mean when the frame is small we stay on the same page after the frame is allocated? Okay.
> 
> Yes.
> 
>> 
>>> 2) The interpreter calls a compiled method. The compiled method is entered with SP still in P but right before the boundary with the next page. The compiler bangs P+2.
>> 
>> Could you remind me about your change? Does compiled code bangs all range from min(interpr_frame_size, comp_frame_size) to max(interpr_frame_size, comp_frame_size) plus StackShadowPages? Or only (max + StackShadowPages)?
> 
> It bangs pages at sp + StackShadowPages*page_size and the next max(interpr_frame_size, comp_frame_size)/page_size pages if any. Before my change, the compiled code banged at sp + StackShadowPages*page_size and the next comp_frame_size/page_size pages if any
> 
>> 
>>> 3) We deoptimize. We pop the frame. SP is in P right before the page boundary. The method has a lot of locals and the interpreter frame size is just below 1 page. After deoptimization SP is in P+1 right before the boundary.
>> 
>> So SP is the same as on entry to the compiled method after the frame pop?
> 
> Yes.
> 
>> Do we touch all stack slots when we reconstruct Interpreter frame during deoptimization? Asking for a case when last slots are in next page and we don't touch it.
> 
> I assume we do.
> 
>> 
>>> 4) We’re at a call, push some arguments and SP moves to P+2 and we call a compiled method. The compiled method bang P+4. P+3 was never touched.
>> 
>> Method's max_stack should take into account the space for output arguments. It need to be taking into account when we bang in compiled code. In 2) compiled code should have bang p+2 and p+3.
> 
> Ok. That would work indeed.
> 
>>> 
>>> Had the compiler banged at P+3 (StackShadowPages+1) in 2), there would be no problem in that example. But then another example with my change and if the compiler bangs at sp+(StackShadowPages+1)*page. Let’s say StackShadowPages=2.
>>> 
>>> 1) SP points in page P. We enter an interpreted frame. The frame is allocated. SP is still in P but right before the page boundary. The interpreter bangs P+1 and P+2.
>>> 2) The interpreter pushes some arguments and we are now in P+1 and calls a compiled method. The compiler bangs P+4. P+3 was never touched.
>>> 
>>> So that doesn’t work either.
>>> 
>>> Wishing we had a whiteboard again? ;-)
>> 
>> Yes and yes!
>> 
>>> 
>>> Maybe the solution is for the compiler to bang at sp + StackShadowPages*page + (interpreter_frame_size % page) and the next interpreter_frame_size/page_size pages. That would mimic what the interpreter does and would work in both examples, above I think. interpreter_frame_size would have to not include what’s on the expression stack of the top frame to be as close as possible to the interpreter behavior.
>> 
>> I don't see how you can determine "next interpreter_frame_size/page_size pages"
>> 
>> As I said before if compiled code takes into account max stack size then first solution should work, I think.
> 
> Let me try that. Thanks!
> 
> Roland.
> 
>> 
>> Thanks,
>> Vladimir
>> 
>> 
>>> 
>>> Roland.
>>> 
>>> 
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> On 4/2/14 1:46 AM, Roland Westrelin wrote:
>>>>>> The question is why you got EXCEPTION_ACCESS_VIOLATION for normal stack bang?  May be it is 8026775 again when one page is skipped during banging. Windows requires sequential pages touche.
>>>>> 
>>>>> I wasn’t aware of this requirement on windows. Thanks, Vladimir.
>>>>> The interpreter bangs up to and including sp + StackShadowPages while the compiled code, with this change, bangs at sp + StackShadowPages + 1. So a page can be skipped and the requirement that all pages be touched sequentially cannot be guaranteed. So we either have to go back to banging at sp + StackShadowPages for the compiled code or enable the code that I pointed to in the signal on 32 bit. What do you think?
>>>>> 
>>>>> Roland.
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>> On 4/1/14 8:00 AM, Roland Westrelin wrote:
>>>>>>> I tried to push that change and couldn’t because of a crash on windows 32 bit. The VM crashes at a stack banging instruction in compiled code but the sp looks to be perfectly valid (not in the yellow zone or red zone, within the stack bounds). I noticed this code in the windows signal handler:
>>>>>>> 
>>>>>>> #ifdef _WIN64
>>>>>>>          //
>>>>>>>          // If it's a legal stack address map the entire region in
>>>>>>>          //
>>>>>>>          PEXCEPTION_RECORD exceptionRecord = exceptionInfo->ExceptionRecord;
>>>>>>>          address addr = (address) exceptionRecord->ExceptionInformation[1];
>>>>>>>          if (addr > thread->stack_yellow_zone_base() && addr < thread->stack_base() ) {
>>>>>>>                  addr = (address)((uintptr_t)addr &
>>>>>>>                         (~((uintptr_t)os::vm_page_size() - (uintptr_t)1)));
>>>>>>>                  os::commit_memory((char *)addr, thread->stack_base() - addr,
>>>>>>>                                    !ExecMem);
>>>>>>>                  return EXCEPTION_CONTINUE_EXECUTION;
>>>>>>>          }
>>>>>>>          else
>>>>>>> #endif
>>>>>>> 
>>>>>>> If I enable it on 32 bit, the jprt tests pass. Does anybody know why this is needed? Why this is WIN64 only?
>>>>>>> 
>>>>>>> Roland.