Possible bug in nmethod reloc verification/scanning in G1
Igor Veresov
igor.veresov at oracle.com
Thu Nov 21 16:35:57 PST 2013
Christos,
This must be a Graal-emitted method that fails. C1 & C2 take special care to emit enough bytes in the prologue to make space for the jump instruction of the patch. Graal should do the same.
The method in the example:
static final Object someobject = new Integer(0xdeadbabe);
final static Object getSomeObject() {
return someobject;
}
Looks like this with C2:
0x000000011185d440: sub $0x18,%rsp
0x000000011185d447: mov %rbp,0x10(%rsp) ;*synchronization entry
; - X::getSomeObject at -1 (line 6)
0x000000011185d44c: movabs $0x74006b240,%rax ; {oop(a 'java/lang/Integer' = -559039810)}
0x000000011185d456: add $0x10,%rsp
0x000000011185d45a: pop %rbp
0x000000011185d45b: test %eax,-0x246a461(%rip) # 0x000000010f3f3000
; {poll_return}
0x000000011185d461: retq
Notice a long version of the sub instruction allocating the frame, that's on purpose, to make sure that jump will fit. See MacroAssembler::verified_entry() in macroAssembler_x86.cpp.
C1, respectively generates either a stack bang (always!):
0x0000000110fa5e20: mov %eax,-0x16000(%rsp)
0x0000000110fa5e27: push %rbp
0x0000000110fa5e28: sub $0x30,%rsp ;*getstatic someobject
; - X::getSomeObject at 0 (line 6)
;; block B0 [0, 3]
0x0000000110fa5e2c: movabs $0x74006b428,%rax ; {oop(a 'java/lang/Integer' = -559039810)}
0x0000000110fa5e36: add $0x30,%rsp
0x0000000110fa5e3a: pop %rbp
0x0000000110fa5e3b: test %eax,-0x262fd41(%rip) # 0x000000010e976100
; {poll_return}
0x0000000110fa5e41: retq
Or if you turn stack bangs off it puts a 5-byte nop (to accommodate for the jump):
0x00000001080a1f20: nopl 0x0(%rax,%rax,1)
0x00000001080a1f25: push %rbp
0x00000001080a1f26: sub $0x30,%rsp ;*getstatic someobject
; - X::getSomeObject at 0 (line 6)
;; block B0 [0, 3]
0x00000001080a1f2a: movabs $0x74006b428,%rax ; {oop(a 'java/lang/Integer' = -559039810)}
0x00000001080a1f34: add $0x30,%rsp
0x00000001080a1f38: pop %rbp
0x00000001080a1f39: test %eax,-0x24dde3f(%rip) # 0x0000000105bc4100
; {poll_return}
0x00000001080a1f3f: retq
igor
On Nov 21, 2013, at 12:46 PM, Christos Kotselidis <christos.kotselidis at oracle.com> wrote:
> Thanks for the info. Yes, I cc'ed initially the compiler team but I guess
> the original email is pending for verification.
>
> Regards
>
> Christos
>
> On 11/21/13 7:31 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com> wrote:
>
>> CC to compiler because it is about compiled nmethods and their state
>> change. So it is interesting for us.
>>
>> Note, usually such methods are inlinied (accessor methods) and big
>> methods have stack bang code at the beginning. Here is criteria used to
>> generate stack bang:
>>
>> bool Compile::need_stack_bang(int frame_size_in_bytes) const {
>> // Determine if we need to generate a stack overflow check.
>> // Do it if the method is not a stub function and
>> // has java calls or has frame size > vm_page_size/8.
>> return (UseStackBanging && stub_function() == NULL &&
>> (has_java_calls() || frame_size_in_bytes >
>> os::vm_page_size()>>3));
>> }
>>
>> In your case stack bang is not generated that is why you have embedded
>> oop at the beginning of the code.
>> We can mark such nmethods so it will be easier to see such nmethod when
>> we need unregister them.
>>
>> And thank you for catching both problems.
>>
>> Thanks,
>> Vladimir
>>
>> On 11/21/13 5:54 AM, Thomas Schatzl wrote:
>>> Hi,
>>>
>>> On Thu, 2013-11-21 at 14:42 +0100, Christos Kotselidis wrote:
>>>> On 11/21/13 2:34 PM, "Thomas Schatzl" <thomas.schatzl at oracle.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Thu, 2013-11-21 at 12:40 +0100, Christos Kotselidis wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I came across the following scenario, I already discussed with Thomas
>>>>>> Schatzl, that I would like to share with you.
>>>>>> There is a method:
>>>>>>
>>>>>> private static final HotSpotGraalRuntime instance = new
>>>>>> HotSpotGraalRuntime();
>>>>>>
>>>>>> public static HotSpotGraalRuntime runtime() {
>>>>>> return instance;
>>>>>> }
>>>>>
>>>>>> The instance field is a constant oop in the code and the method is
>>>>>> installed.
>>>>>> At some stage there is a GC with post verification
>>>>>> (including G1VerifyHeapRegionCodeRoots) that passes.
>>>>>> Before the next GC cycle and during the mutation cycle, another
>>>>>> method
>>>>>> is being installed which causes the eviction of our method.
>>>>>> Consequently our method is being made not_entrant and it is being
>>>>>> patched with a jmp instruction at its verified entry point
>>>>>> (nmethod.cpp::make_not_entrant_or_zombie). The patching causes the
>>>>>> oop
>>>>>> to be overwritten by the jmp instruction. Furthermore, that oop was
>>>>>> the only oop that was responsible for attaching this method to its
>>>>>> correspondent HeapRegion.
>>>>>>
>>>>>> The next GC cycle (Pre-verification) comes and the HeapRegionCodeRoot
>>>>>> verification starts (nmethod.cpp::oops_do). There is a logic there
>>>>>> (which I also believe it may be buggy as I will explain later) that
>>>>>> checks whether the method is not_entrant. If the method is
>>>>>> not_entrant
>>>>>> then the logic correctly assumes that the method has been already
>>>>>> been
>>>>>> patched and therefore starts scanning the relocs of the method in +5
>>>>>> bytes (for x86)/+4 bytes (for SPARC) (as the comment states). This
>>>>>> results in not scanning the single oop that was attaching this
>>>>>> specific method to its heap region. The verification then correctly
>>>>>> asserts by saying that "there is a method attached to this specific
>>>>>> heap region with no oops pointing into it".
>>>>>
>>>>> Thanks for finding this - I already filed a bug for that
>>>>> (https://bugs.openjdk.java.net/browse/JDK-8028736). Please check if it
>>>>> matches your investigation.
>>>>>>
>>>>>> Concerning the second bug in the method::oops_do logic, here is the
>>>>>> code for manipulating the reloc starting address after a method has
>>>>>> been patched:
>>>>>
>>>>>> // If the method is not entrant or zombie then a JMP is plastered
>>>>>> over
>>>>>> // the
>>>>>> // first few bytes. If an oop in the old code was there, that oop
>>>>>> // should not get GC'd. Skip the first few bytes of oops on
>>>>>> // not-entrant methods.
>>>>>> address low_boundary = verified_entry_point();
>>>>>> if (is_not_entrant()) {
>>>>>> low_boundary += NativeJump::instruction_size;
>>>>>> // %%% Note: On SPARC we patch only a 4-byte trap, not a full
>>>>>> NativeJump.
>>>>>> // (See comment above.)
>>>>>> }
>>>>>>
>>>>>> The code above accounts only for the "is not entrant" scenario and
>>>>>> not
>>>>>> with the "is zombie" scenario.
>>>>>
>>>>> nmethod::do_oops will not be called for zombie methods in the default
>>>>> case, so it does not matter. I.e. the default overload of
>>>>> oops_do(Closure* cl, bool zombie) is oops_do(cl, false);
>>>>
>>>> When unregistering a method we call the oops_do allowing zombie methods
>>>> (G1CollectedHeap::unregister_nmethod). That was causing a problem in my
>>>> case and after adding "|| is_zombie()", it got fixed.
>>>
>>> I think you are right, we will need to fix that too as the oop map is
>>> not changed after overwriting the oop in the first few bytes.
>>>
>>> Thomas
>>>
>>>
>
>
More information about the hotspot-compiler-dev
mailing list