Possible bug in nmethod reloc verification/scanning in G1

Thu Nov 21 13:34:41 UTC 2013

Hi,

On Thu, 2013-11-21 at 12:40 +0100, Christos Kotselidis wrote:
> Hi,
>
> I came across the following scenario, I already discussed with Thomas
>  Schatzl, that I would like to share with you.
> There is a method:
>
> private static final HotSpotGraalRuntime instance = new HotSpotGraalRuntime();
>
> public static HotSpotGraalRuntime runtime() {
>         return instance;
>}

> The instance field is a constant oop in the code and the method is
> installed.
> At some stage there is a GC with post verification
> (including G1VerifyHeapRegionCodeRoots) that passes. 
> Before the next GC cycle and during the mutation cycle, another method
> is being installed which causes the eviction of our method.
> Consequently our method is being made not_entrant and it is being
> patched with a jmp instruction at its verified entry point
> (nmethod.cpp::make_not_entrant_or_zombie). The patching causes the oop
> to be overwritten by the jmp instruction. Furthermore, that oop was
> the only oop that was responsible for attaching this method to its
> correspondent HeapRegion.
> 
> The next GC cycle (Pre-verification) comes and the HeapRegionCodeRoot
> verification starts (nmethod.cpp::oops_do). There is a logic there
> (which I also believe it may be buggy as I will explain later) that
> checks whether the method is not_entrant. If the method is not_entrant
> then the logic correctly assumes that the method has been already been
> patched and therefore starts scanning the relocs of the method in +5
> bytes (for x86)/+4 bytes (for SPARC) (as the comment states). This
> results in not scanning the single oop that was attaching this
> specific method to its heap region. The verification then correctly
> asserts by saying that "there is a method attached to this specific
> heap region with no oops pointing into it".

Thanks for finding this - I already filed a bug for that
(https://bugs.openjdk.java.net/browse/JDK-8028736). Please check if it
matches your investigation.
> 
> Concerning the second bug in the method::oops_do logic, here is the
> code for manipulating the reloc starting address after a method has
> been patched:

> // If the method is not entrant or zombie then a JMP is plastered over
>// the
> // first few bytes.  If an oop in the old code was there, that oop
> // should not get GC'd.  Skip the first few bytes of oops on
> // not-entrant methods.
>   address low_boundary = verified_entry_point();
>   if (is_not_entrant()) {
>     low_boundary += NativeJump::instruction_size;
>     // %%% Note:  On SPARC we patch only a 4-byte trap, not a full
> NativeJump.
>     // (See comment above.)
>   }
>
> The code above accounts only for the "is not entrant" scenario and not
> with the "is zombie" scenario. 

nmethod::do_oops will not be called for zombie methods in the default
case, so it does not matter. I.e. the default overload of
oops_do(Closure* cl, bool zombie) is oops_do(cl, false);

Also the first assert in oops_do() checks.

> I hope I got the story right…what do you think?

Yes, thanks for finding this issue (and keep investigating it).

Thomas