Request for reviews (S): 7125136: SIGILL on linux amd64 in gc/ArrayJuggle/Juggle29

Wed Feb 15 12:28:50 PST 2012

On Feb 15, 2012, at 11:47 AM, Vladimir Kozlov wrote:

> The prolog code is changed as Tom suggested and I moved prolog assembler code generation into macroassembler since it is almost the same for 32 and 64-bit. But I kept format code separate.

I like that a lot.  Could you put on a comment on the new _imm32 function indicating that they force the use of a 4 byte value even if it's fit into 8bit?

The frame_complete location will be different but only when extra flags are on, which is probably fine.

Otherwise it looks great.

tom

> 
> http://cr.openjdk.java.net/~kvn/7125136/webrev
> 
> Thanks,
> Vladimir
> 
> Vladimir Kozlov wrote:
>> > Is there any issue with having to different prologs?  Why not use the SUB/MOV variant all the time?
>> No problems, I think, only size of prolog: push is one byte and mov could be from 3 to 7 bytes depending on framesize and bitness, sub is 3 bytes if framesize fits into byte. So I would like to keep 2 versions separate as in your example.
>> And I will move VerifyStackAtCalls and in_24_bit_fp_mode logic down. It will eliminate a need for fat_nop() use.
>> Thanks,
>> Vladimir
>> Tom Rodriguez wrote:
>>> On Feb 14, 2012, at 12:51 PM, Vladimir Kozlov wrote:
>>> 
>>>> I updated changes to use fixed fat_nop(). Also changed MachProlog code to use masm instructions (have to add subptr_imm32() for that).
>>> 
>>> Would it be possible refactor the code to make this whole thing simpler?  The VerifyStackAtCalls and in_24_bit_fp_mode logic make it very complicated.  Can't that stuff all be done after a fairly standard prolog?  VerifyStackAtCalls would have to use mov instead of push but that would be ok I think.  So it would be something more like:
>>> 
>>>    if (C->need_stack_bang(framesize)) {
>>>      st->print_cr("# stack bang"); st->print("\t");
>>>      st->print_cr("PUSHL  EBP"); st->print("\t");
>>>      st->print("SUB    ESP,%d\t# Create frame",framesize);
>>>    } else {
>>>      st->print("SUB    ESP,%d\t# Create frame",framesize);
>>>      st->print_cr(""); st->print("\t");
>>>      st->print("MOV  [ESP + #%d],EBP",framesize);
>>>    }
>>> 
>>>    if( C->in_24_bit_fp_mode() ) {
>>>      st->print("FLDCW  24 bit fpu control word");
>>>      st->print_cr(""); st->print("\t");
>>>    }
>>> 
>>> !   if (VerifyStackAtCalls) { // Majik cookie to verify stack depth
>>>      st->print("MOV  [ESP + #%d], 0xBADB100D\t# Majik cookie for stack depth check", magic_offset);
>>>      st->print_cr(""); st->print("\t");
>>>      framesize -= wordSize;
>>>    }
>>> 
>>> Is there any issue with having to different prologs?  Why not use the SUB/MOV variant all the time?
>>> 
>>> tom
>>> 
>>> 
>>>> http://cr.openjdk.java.net/~kvn/7125136/webrev
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> Vladimir Kozlov wrote:
>>>>> Yes, the fat_nop() sequence is wrong. It was from old Guide, in new version of the Guide address nops are recommended. So I will change it and use in my code.
>>>>> Vladimir
>>>>> Vladimir Kozlov wrote:
>>>>>> I found that we have already such function fat_nop() but I am not sure it is correct. I am worried that it is 2 nops in 64-bit VM:
>>>>>> 
>>>>>> emit_byte(0x66);
>>>>>> emit_byte(0x66);
>>>>>> emit_byte(0x90);
>>>>>> emit_byte(0x66);
>>>>>> emit_byte(0x90);
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>> Vladimir Kozlov wrote:
>>>>>>> Christian Thalinger wrote:
>>>>>>>> On Feb 14, 2012, at 4:56 PM, Vladimir Kozlov wrote:
>>>>>>>> 
>>>>>>>>> Thank you, Christian
>>>>>>>>> 
>>>>>>>>> We can't use address nop on old cpus (amd before ss2 and intel before p6, see vm_version_x86.cpp).
>>>>>>>> Right, I missed that.  It would be a good idea to have an assert or guarantee checking UseAddressNop in these methods.
>>>>>>> I'm thinking now to add new asm function nop_for_patching() which will generated either addr_nop_5 or stack_bang based on UseAddressNop flag.
>>>>>>> 
>>>>>>> I will add asserts into addr_nop_*().
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>>> -- Chris
>>>>>>>> 
>>>>>>>>> I will look on masm definition in MachProlog.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>> 
>>>>>>>>> On 2/14/12 1:08 AM, Christian Thalinger wrote:
>>>>>>>>>> On Feb 14, 2012, at 5:02 AM, Vladimir Kozlov wrote:
>>>>>>>>>> 
>>>>>>>>>>> http://cr.openjdk.java.net/~kvn/7125136/webrev
>>>>>>>>>>> 
>>>>>>>>>>> 7125136: SIGILL on linux amd64 in gc/ArrayJuggle/Juggle29
>>>>>>>>>>> 
>>>>>>>>>>> Saving RBP register on nmethod's entry broke nmethod's verified entry patching when it become non-entrant. There is big comment in MachPrologNode::emit() about first instruction which should be at lest 5 bytes long. And push(rbp) is one byte instruction. VerifyFPU code also broken (first instruction is pushf). The same with C1 generated code with VerifyFPU and C1Breakpoint.
>>>>>>>>>>> 
>>>>>>>>>>> The only reason we did not noticed this until now is stack bang instruction is usually generated first and it is big (store to stack with big offset). But C2 does not generated it if compiled stack frame is small and no calls in compiled method.
>>>>>>>>>>> 
>>>>>>>>>>> For C2 moved saving EBP after ESP adjustment. And other cleanup in prolog code.
>>>>>>>>>>> For C1 generated stack bang with small offset (-256) first if needed.
>>>>>>>>>> Why are we using bang_stack_with_offset in this case?  How about addr_nop_5?
>>>>>>>>>> 
>>>>>>>>>> It would be better if you would define masm at the beginning of MachPrologNode::emit instead of doing it multiple times in if clauses.
>>>>>>>>>> 
>>>>>>>>>> Otherwise this looks good.
>>>>>>>>>> 
>>>>>>>>>> -- Chris
>>>>>>>>>> 
>>>>>>>>>>> Verified by examining generated code.
>>>>>>>>>>> 
>>>>>>>>>>> SPARC code is safe since we generate 1 instruction (load from 0) for patching and use signal handler.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>