[9] RFR(S): 8029443: 'assert(klass->is_loader_alive(_is_alive)) failed: must be alive' during VM_CollectForMetadataAllocation

Mikael Gerdin mikael.gerdin at oracle.com
Tue Jul 22 11:27:52 UTC 2014


Tobias,

On Tuesday 22 July 2014 11.22.27 Tobias Hartmann wrote:
> On 21.07.2014 20:59, Vladimir Kozlov wrote:
> > On 7/21/14 1:44 AM, Tobias Hartmann wrote:
> >> Vladimir, Coleen, thanks for the reviews!
> >> 
> >> On 18.07.2014 20:09, Vladimir Kozlov wrote:
> >>> On 7/18/14 11:02 AM, Coleen Phillimore wrote:
> >>>> On 7/18/14, 11:06 AM, Vladimir Kozlov wrote:
> >>>>> On 7/18/14 4:38 AM, Tobias Hartmann wrote:
> >>>>>> Hi,
> >>>>>> 
> >>>>>> I spend some more days and was finally able to implement a test that
> >>>>> 
> >>>>>> deterministically triggers the bug:
> >>>>> Why do you need to switch off compressed oops? Do you need to switch
> >>>>> off compressed klass pointers too (UseCompressedClassPointers)?
> >>>> 
> >>>> CompressedOops when off turns off CompressedClassPointers.
> >>> 
> >>> You are right, I forgot that. Still the question is why switch off
> >>> coop?
> >> 
> >> I'm only able to reproduce the bug without compressed oops. The original
> >> bug also only reproduces with -XX:-UseCompressedOops. I tried to figure
> >> out why (on Sparc):
> >> 
> >> With compressed oops enabled, Method* metadata referencing 'WorkerClass'
> >> is added to 'doWork' in MacroAssembler::set_narrow_klass(..). In
> >> CodeBuffer::finalize_oop_references(..) the metadata is processed and an
> >> oop to the class loader 'URLClassLoader' is added. This oop leads to the
> >> unloading of 'doWork', hence the verification code is never executed.
> >> 
> >> I'm not sure what set_narrow_klass(..) is used for in this case. I
> >> assume it stores a 'WorkerClass' Klass* in a register as part of an
> >> optimization? Because 'doWork' potentially works on any class.
> >> Apparently this optimization is not performed without compressed oops.
> > 
> > I would suggest to compare 'doWork' assembler
> > (-XX:CompileCommand=print,TestMethodUnloading::doWork) with coop and
> > without it. Usually loaded into register class is used for klass
> > compare do guard inlining code. Or to initialize new object.
> > 
> > I don't see loading (constructing) uncompressed (whole) klass pointer
> > from constant in sparc.ad. It could be the reason for different
> > behavior. It could be loaded from constants section. But constants
> > section should have metadata relocation info in such case.
> 
> I did as you suggested and found the following:
> 
> During the profiling phase the class given to 'doWork' always is
> 'WorkerClass'. The C2 compiler therefore optimizes the compiled version
> to expect a 'WorkerClass'. The branch that instantiates a new object is
> guarded by an uncommon trap (class_check). The difference between the
> two versions (with and without compressed oops) is the loading of the
> 'WorkerClass' Klass to check if the given class is equal:
> 
> With compressed oops:
>     SET    narrowklass: precise klass WorkerClass:
> 0x00000001004a0d40:Constant:exact *,R_L1 ! compressed klass ptr
>     CWBne  R_L2,R_L1,B8      ! compressed ptr  P=0.000001 C=-1.000000
> 
> Without:
>     SET    precise klass WorkerClass: 0x00000001004aeab0:Constant:exact
> *,R_L1      ! non-oop ptr
>     CXBpne R_L2,R_L1,B8     ! ptr  P=0.000001 C=-1.000000
> 
> R_L2: class given as parameter
> B8: location of uncommon trap
> 
> In the first case, the Klass is loaded by a 'loadConNKlass' instruction
> that calls MacroAssembler::set_narrow_klass(..) which then creates a
> metadata_Relocation for the 'WorkerClass'. This metada_Relocation is
> processed by CodeBuffer::finalize_oop_references(..) and an oop to
> 'WorkerClass' is added. This oop causes the unloading of the method.
> 
> In the second case, the Klass is loaded by a 'loadConP_no_oop_cheap'
> instruction that does not create a metadata_Relocation.
> 
> I don't understand why the metadata_Relocation in the first case is
> needed? As the test shows it is better to only unload the method if we
> hit the uncommon trap because we could still use other (potentially
> complex) branches of the method.

It sounds to me like the case without compressed oops is incorrect.
If WorkerClass is unloaded then the memory address 0x00000001004aeab0 could 
potentially be used for some other class at some point in the future.
This could cause a false positive in the class check and cause corruption 
and/or a crash, right?

Currently the only course of action in this case is to force the method to be 
unloaded. Another alternative approach would be to use a metadata reloaction 
to set the class check to an impossible value, effectively making the code for 
the dead class unreachable.

It seems most likely to me that this is a leftover bug from perm gen removal, 
since Klasses are no longer oops they can be materialized as a 
immP_no_oop_cheap and not generate a relocation entry even though one should 
be needed.

Is there a similar construct on other platforms?

/Mikael

> 
> Thanks,
> Tobias
> 
> > thanks,
> > Vladimir
> > 
> >> Best,
> >> Tobias
> >> 
> >>> Vladimir
> >>> 
> >>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.02/
> >>>>> 
> >>>>> Very nice!
> >>>> 
> >>>> Yes, I agree.  Impressive.
> >>>> 
> >>>> The refactoring in nmethod.cpp looks good to me.  I have no further
> >>>> comments.
> >>>> Thanks!
> >>>> Coleen
> >>>> 
> >>>>>> @Vladimir: The test shows why we should only clean the ICs but not
> >>>>>> unload the nmethod if possible. The method ' doWork'
> >>>>>> is still valid after WorkerClass was unloaded and depending on the
> >>>>>> complexity of the method we should avoid unloading it.
> >>>>> 
> >>>>> Make sense.
> >>>>> 
> >>>>>> On Sparc my patch fixes the bug and leads to the nmethod not being
> >>>>>> unloaded. The compiled version is therefore used even
> >>>>>> after WorkerClass is unloaded.
> >>>>>> 
> >>>>>> On x86 the nmethod is unloaded anyway because of a dead oop. This is
> >>>>>> probably due to a slightly different implementation
> >>>>>> of the ICs. I'll have a closer look to see if we can improve that.
> >>>>> 
> >>>>> Thanks,
> >>>>> Vladimir
> >>>>> 
> >>>>>> Thanks,
> >>>>>> Tobias
> >>>>>> 
> >>>>>> On 16.07.2014 10:36, Tobias Hartmann wrote:
> >>>>>>> Sorry, forgot to answer this question:
> >>>>>>>> Were you able to create a small test case for it that would be
> >>>>>>>> useful to add?
> >>>>>>> 
> >>>>>>> Unfortunately I was not able to create a test. The bug only
> >>>>>>> reproduces on a particular system with a > 30 minute run
> >>>>>>> of runThese.
> >>>>>>> 
> >>>>>>> Best,
> >>>>>>> Tobias
> >>>>>>> 
> >>>>>>> On 16.07.2014 09:54, Tobias Hartmann wrote:
> >>>>>>>> Hi Coleen,
> >>>>>>>> 
> >>>>>>>> thanks for the review.
> >>>>>>>> 
> >>>>>>>>> *+           if (csc->is_call_to_interpreted() &&
> >>>>>>>>> stub_contains_dead_metadata(is_alive, csc->destination())) {*
> >>>>>>>>> *+             csc->set_to_clean();*
> >>>>>>>>> *+           }*
> >>>>>>>>> 
> >>>>>>>>> This appears in each case.  Can you fold it and the new function
> >>>>>>>>> into a function like
> >>>>>>>>> clean_call_to_interpreted_stub(is_alive, csc)?
> >>>>>>>> 
> >>>>>>>> I folded it into the function clean_call_to_interpreter_stub(..).
> >>>>>>>> 
> >>>>>>>> New webrev:
> >>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.01/
> >>>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> Tobias
> >>>>>>>> 
> >>>>>>>>> Thanks,
> >>>>>>>>> Coleen
> >>>>>>>>> 
> >>>>>>>>>> So before the permgen removal embedded method* were oops and
> >>>>>>>>>> they
> >>>>>>>>>> were processed in relocInfo::oop_type loop.
> >>>>>>>>>> 
> >>>>>>>>>> May be instead of specializing opt_virtual_call_type and
> >>>>>>>>>> static_call_type call site you can simple add a loop for
> >>>>>>>>>> relocInfo::metadata_type (similar to oop_type loop)?
> >>>>>>>>>> 
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Vladimir
> >>>>>>>>>> 
> >>>>>>>>>> On 7/14/14 4:56 AM, Tobias Hartmann wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>> 
> >>>>>>>>>>> please review the following patch for JDK-8029443.
> >>>>>>>>>>> 
> >>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8029443
> >>>>>>>>>>> Webrev:
> >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.00/
> >>>>>>>>>>> 
> >>>>>>>>>>> *Problem*
> >>>>>>>>>>> After the tracing/marking phase of GC,
> >>>>>>>>>>> nmethod::do_unloading(..)
> >>>>>>>>>>> checks
> >>>>>>>>>>> if a nmethod can be unloaded because it contains dead oops. If
> >>>>>>>>>>> class
> >>>>>>>>>>> unloading occurred we additionally clear all ICs where the
> >>>>>>>>>>> cached
> >>>>>>>>>>> metadata refers to an unloaded klass or method. If the nmethod
> >>>>>>>>>>> is not
> >>>>>>>>>>> unloaded, nmethod::verify_metadata_loaders(..) finally
> >>>>>>>>>>> checks if
> >>>>>>>>>>> all
> >>>>>>>>>>> metadata is alive. The assert in CheckClass::check_class fails
> >>>>>>>>>>> because
> >>>>>>>>>>> the nmethod contains Method* metadata corresponding to a dead
> >>>>>>>>>>> Klass.
> >>>>>>>>>>> The Method* belongs to a to-interpreter stub [1] of an
> >>>>>>>>>>> optimized
> >>>>>>>>>>> compiled IC. Normally we clear those stubs prior to
> >>>>>>>>>>> verification to
> >>>>>>>>>>> avoid dangling references to Method* [2], but only if the stub
> >>>>>>>>>>> is not in
> >>>>>>>>>>> use, i.e. if the IC is not in to-interpreted mode. In this
> >>>>>>>>>>> case the
> >>>>>>>>>>> to-interpreter stub may be executed and hand a stale Method*
> >>>>>>>>>>> to the
> >>>>>>>>>>> interpreter.
> >>>>>>>>>>> 
> >>>>>>>>>>> *Solution
> >>>>>>>>>>> *The implementation of nmethod::do_unloading(..) is changed to
> >>>>>>>>>>> clean
> >>>>>>>>>>> compiled ICs and compiled static calls if they call into a
> >>>>>>>>>>> to-interpreter stub that references dead Method* metadata.
> >>>>>>>>>>> 
> >>>>>>>>>>> The patch was affected by the G1 class unloading changes
> >>>>>>>>>>> (JDK-8048248)
> >>>>>>>>>>> because the method nmethod::do_unloading_parallel(..) was
> >>>>>>>>>>> added. I
> >>>>>>>>>>> adapted the implementation as well.
> >>>>>>>>>>> *
> >>>>>>>>>>> Testing
> >>>>>>>>>>> *Failing test (runThese)
> >>>>>>>>>>> JPRT
> >>>>>>>>>>> 
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Tobias
> >>>>>>>>>>> 
> >>>>>>>>>>> [1] see CompiledStaticCall::emit_to_interp_stub(..)
> >>>>>>>>>>> [2] see nmethod::verify_metadata_loaders(..),
> >>>>>>>>>>> static_stub_reloc()->clear_inline_cache() clears the stub



More information about the hotspot-dev mailing list