Possible bug in nmethod reloc verification/scanning in G1

Thu Nov 21 16:45:35 PST 2013

Hi Igor,

Correct, this is a Graal emitted method which I am pretty sure we do not
guarantee the 5 byte invariant for the patchingŠ

Thanks for the info.

Christos

On 11/22/13 1:35 AM, "Igor Veresov" <igor.veresov at oracle.com> wrote:

>Christos,
>
>This must be a Graal-emitted method that fails. C1 & C2 take special care
>to emit enough bytes in the prologue to make space for the jump
>instruction of the patch. Graal should do the same.
>
>The method in the example:
>
>static final Object someobject = new Integer(0xdeadbabe);
>  final static Object getSomeObject() {
>    return someobject;
>  }
>
>Looks like this with C2:
>  0x000000011185d440: sub $0x18,%rsp
>  0x000000011185d447: mov %rbp,0x10(%rsp) ;*synchronization entry
>                                                ; - X::getSomeObject at -1
>(line 6)
>
>  0x000000011185d44c: movabs $0x74006b240,%rax ; {oop(a
>'java/lang/Integer' = -559039810)}
>  0x000000011185d456: add $0x10,%rsp
>  0x000000011185d45a: pop %rbp
>  0x000000011185d45b: test %eax,-0x246a461(%rip) # 0x000000010f3f3000
>                                                ; {poll_return}
>  0x000000011185d461: retq
>
>Notice a long version of the sub instruction allocating the frame, that's
>on purpose, to make sure that jump will fit. See
>MacroAssembler::verified_entry() in macroAssembler_x86.cpp.
>
>C1, respectively generates either a stack bang (always!):
> 0x0000000110fa5e20: mov %eax,-0x16000(%rsp)
>  0x0000000110fa5e27: push %rbp
>  0x0000000110fa5e28: sub $0x30,%rsp ;*getstatic someobject
>                                                ; - X::getSomeObject at 0
>(line 6)
>
>  ;; block B0 [0, 3]
>
>  0x0000000110fa5e2c: movabs $0x74006b428,%rax ; {oop(a
>'java/lang/Integer' = -559039810)}
>  0x0000000110fa5e36: add $0x30,%rsp
>  0x0000000110fa5e3a: pop %rbp
>  0x0000000110fa5e3b: test %eax,-0x262fd41(%rip) # 0x000000010e976100
>                                                ; {poll_return}
>  0x0000000110fa5e41: retq
>
>Or if you turn stack bangs off it puts a 5-byte nop (to accommodate for
>the jump):
>
>  0x00000001080a1f20: nopl 0x0(%rax,%rax,1)
>  0x00000001080a1f25: push %rbp
>  0x00000001080a1f26: sub $0x30,%rsp ;*getstatic someobject
>                                                ; - X::getSomeObject at 0
>(line 6)
>
>  ;; block B0 [0, 3]
>
>  0x00000001080a1f2a: movabs $0x74006b428,%rax ; {oop(a
>'java/lang/Integer' = -559039810)}
>  0x00000001080a1f34: add $0x30,%rsp
>  0x00000001080a1f38: pop %rbp
>  0x00000001080a1f39: test %eax,-0x24dde3f(%rip) # 0x0000000105bc4100
>                                                ; {poll_return}
>  0x00000001080a1f3f: retq
>
>
>igor
>
>On Nov 21, 2013, at 12:46 PM, Christos Kotselidis
><christos.kotselidis at oracle.com> wrote:
>
>> Thanks for the info. Yes, I cc'ed initially the compiler team but I
>>guess
>> the original email is pending for verification.
>> 
>> Regards
>> 
>> Christos
>> 
>> On 11/21/13 7:31 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com>
>>wrote:
>> 
>>> CC to compiler because it is about compiled nmethods and their state
>>> change. So it is interesting for us.
>>> 
>>> Note, usually such methods are inlinied (accessor methods) and big
>>> methods have stack bang code at the beginning. Here is criteria used to
>>> generate stack bang:
>>> 
>>> bool Compile::need_stack_bang(int frame_size_in_bytes) const {
>>>  // Determine if we need to generate a stack overflow check.
>>>  // Do it if the method is not a stub function and
>>>  // has java calls or has frame size > vm_page_size/8.
>>>  return (UseStackBanging && stub_function() == NULL &&
>>>          (has_java_calls() || frame_size_in_bytes >
>>> os::vm_page_size()>>3));
>>> }
>>> 
>>> In your case stack bang is not generated that is why you have embedded
>>> oop at the beginning of the code.
>>> We can mark such nmethods so it will be easier to see such nmethod when
>>> we need unregister them.
>>> 
>>> And thank you for catching both problems.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 11/21/13 5:54 AM, Thomas Schatzl wrote:
>>>> Hi,
>>>> 
>>>> On Thu, 2013-11-21 at 14:42 +0100, Christos Kotselidis wrote:
>>>>> On 11/21/13 2:34 PM, "Thomas Schatzl" <thomas.schatzl at oracle.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> On Thu, 2013-11-21 at 12:40 +0100, Christos Kotselidis wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I came across the following scenario, I already discussed with
>>>>>>>Thomas
>>>>>>>  Schatzl, that I would like to share with you.
>>>>>>> There is a method:
>>>>>>> 
>>>>>>> private static final HotSpotGraalRuntime instance = new
>>>>>>> HotSpotGraalRuntime();
>>>>>>> 
>>>>>>> public static HotSpotGraalRuntime runtime() {
>>>>>>>         return instance;
>>>>>>> }
>>>>>> 
>>>>>>> The instance field is a constant oop in the code and the method is
>>>>>>> installed.
>>>>>>> At some stage there is a GC with post verification
>>>>>>> (including G1VerifyHeapRegionCodeRoots) that passes.
>>>>>>> Before the next GC cycle and during the mutation cycle, another
>>>>>>> method
>>>>>>> is being installed which causes the eviction of our method.
>>>>>>> Consequently our method is being made not_entrant and it is being
>>>>>>> patched with a jmp instruction at its verified entry point
>>>>>>> (nmethod.cpp::make_not_entrant_or_zombie). The patching causes the
>>>>>>> oop
>>>>>>> to be overwritten by the jmp instruction. Furthermore, that oop was
>>>>>>> the only oop that was responsible for attaching this method to its
>>>>>>> correspondent HeapRegion.
>>>>>>> 
>>>>>>> The next GC cycle (Pre-verification) comes and the
>>>>>>>HeapRegionCodeRoot
>>>>>>> verification starts (nmethod.cpp::oops_do). There is a logic there
>>>>>>> (which I also believe it may be buggy as I will explain later) that
>>>>>>> checks whether the method is not_entrant. If the method is
>>>>>>> not_entrant
>>>>>>> then the logic correctly assumes that the method has been already
>>>>>>> been
>>>>>>> patched and therefore starts scanning the relocs of the method in
>>>>>>>+5
>>>>>>> bytes (for x86)/+4 bytes (for SPARC) (as the comment states). This
>>>>>>> results in not scanning the single oop that was attaching this
>>>>>>> specific method to its heap region. The verification then correctly
>>>>>>> asserts by saying that "there is a method attached to this specific
>>>>>>> heap region with no oops pointing into it".
>>>>>> 
>>>>>> Thanks for finding this - I already filed a bug for that
>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8028736). Please check if
>>>>>>it
>>>>>> matches your investigation.
>>>>>>> 
>>>>>>> Concerning the second bug in the method::oops_do logic, here is the
>>>>>>> code for manipulating the reloc starting address after a method has
>>>>>>> been patched:
>>>>>> 
>>>>>>> // If the method is not entrant or zombie then a JMP is plastered
>>>>>>> over
>>>>>>> // the
>>>>>>> // first few bytes.  If an oop in the old code was there, that oop
>>>>>>> // should not get GC'd.  Skip the first few bytes of oops on
>>>>>>> // not-entrant methods.
>>>>>>>   address low_boundary = verified_entry_point();
>>>>>>>   if (is_not_entrant()) {
>>>>>>>     low_boundary += NativeJump::instruction_size;
>>>>>>>     // %%% Note:  On SPARC we patch only a 4-byte trap, not a full
>>>>>>> NativeJump.
>>>>>>>     // (See comment above.)
>>>>>>>   }
>>>>>>> 
>>>>>>> The code above accounts only for the "is not entrant" scenario and
>>>>>>> not
>>>>>>> with the "is zombie" scenario.
>>>>>> 
>>>>>> nmethod::do_oops will not be called for zombie methods in the
>>>>>>default
>>>>>> case, so it does not matter. I.e. the default overload of
>>>>>> oops_do(Closure* cl, bool zombie) is oops_do(cl, false);
>>>>> 
>>>>> When unregistering a method we call the oops_do allowing zombie
>>>>>methods
>>>>> (G1CollectedHeap::unregister_nmethod). That was causing a problem in
>>>>>my
>>>>> case and after adding "|| is_zombie()", it got fixed.
>>>> 
>>>> I think you are right, we will need to fix that too as the oop map is
>>>> not changed after overwriting the oop in the first few bytes.
>>>> 
>>>> Thomas
>>>> 
>>>> 
>> 
>> 
>