Possible bug in nmethod reloc verification/scanning in G1

Christos Kotselidis christos.kotselidis at oracle.com
Thu Nov 21 13:42:18 UTC 2013


On 11/21/13 2:34 PM, "Thomas Schatzl" <thomas.schatzl at oracle.com> wrote:

>Hi,
>
>On Thu, 2013-11-21 at 12:40 +0100, Christos Kotselidis wrote:
>> Hi,
>>
>> I came across the following scenario, I already discussed with Thomas
>>  Schatzl, that I would like to share with you.
>> There is a method:
>>
>> private static final HotSpotGraalRuntime instance = new
>>HotSpotGraalRuntime();
>>
>> public static HotSpotGraalRuntime runtime() {
>>         return instance;
>>}
>
>> The instance field is a constant oop in the code and the method is
>> installed.
>> At some stage there is a GC with post verification
>> (including G1VerifyHeapRegionCodeRoots) that passes.
>> Before the next GC cycle and during the mutation cycle, another method
>> is being installed which causes the eviction of our method.
>> Consequently our method is being made not_entrant and it is being
>> patched with a jmp instruction at its verified entry point
>> (nmethod.cpp::make_not_entrant_or_zombie). The patching causes the oop
>> to be overwritten by the jmp instruction. Furthermore, that oop was
>> the only oop that was responsible for attaching this method to its
>> correspondent HeapRegion.
>> 
>> The next GC cycle (Pre-verification) comes and the HeapRegionCodeRoot
>> verification starts (nmethod.cpp::oops_do). There is a logic there
>> (which I also believe it may be buggy as I will explain later) that
>> checks whether the method is not_entrant. If the method is not_entrant
>> then the logic correctly assumes that the method has been already been
>> patched and therefore starts scanning the relocs of the method in +5
>> bytes (for x86)/+4 bytes (for SPARC) (as the comment states). This
>> results in not scanning the single oop that was attaching this
>> specific method to its heap region. The verification then correctly
>> asserts by saying that "there is a method attached to this specific
>> heap region with no oops pointing into it".
>
>Thanks for finding this - I already filed a bug for that
>(https://bugs.openjdk.java.net/browse/JDK-8028736). Please check if it
>matches your investigation.
>> 
>> Concerning the second bug in the method::oops_do logic, here is the
>> code for manipulating the reloc starting address after a method has
>> been patched:
>
>> // If the method is not entrant or zombie then a JMP is plastered over
>>// the
>> // first few bytes.  If an oop in the old code was there, that oop
>> // should not get GC'd.  Skip the first few bytes of oops on
>> // not-entrant methods.
>>   address low_boundary = verified_entry_point();
>>   if (is_not_entrant()) {
>>     low_boundary += NativeJump::instruction_size;
>>     // %%% Note:  On SPARC we patch only a 4-byte trap, not a full
>> NativeJump.
>>     // (See comment above.)
>>   }
>>
>> The code above accounts only for the "is not entrant" scenario and not
>> with the "is zombie" scenario.
>
>nmethod::do_oops will not be called for zombie methods in the default
>case, so it does not matter. I.e. the default overload of
>oops_do(Closure* cl, bool zombie) is oops_do(cl, false);

When unregistering a method we call the oops_do allowing zombie methods
(G1CollectedHeap::unregister_nmethod). That was causing a problem in my
case and afteradding "|| is_zombie()", it got fixed.

The bug report matches my conclusion.

>
>Also the first assert in oops_do() checks.
>
>> I hope I got the story rightŠwhat do you think?
>
>Yes, thanks for finding this issue (and keep investigating it).
>
>Thomas
>
>
>

Thanks Thomas

Regards

Christos
>





More information about the hotspot-gc-dev mailing list