[9] RFR(S): 8029443: 'assert(klass->is_loader_alive(_is_alive)) failed: must be alive' during VM_CollectForMetadataAllocation

Mikael Gerdin mikael.gerdin at oracle.com
Fri Jul 25 14:13:56 UTC 2014


Tobias,

On Friday 25 July 2014 15.54.38 Tobias Hartmann wrote:
> Mikael, Vladimir, thanks for the review.
> 
> The problem is indeed caused by a missing check for a metadata pointer
> in sparc.ad.
> 
> Adding 'n->bottom_type()->isa_klassptr()' checks to immP_load() and
> immP_no_oop_cheap() fixes the problem. The Klass pointer is then loaded
> from the constant table (loadConP_load()) and a metadata relocation is
> added by Compile::ConstantTable::emit().

There shouldn't be anything stopping the Klass* from being emitted as an 
immediate just as long as the appropriate relocation entry is created.
We don't need to update the immediate in the instruction stream for the Klass* 
since we don't move klasses any more.

I can't find any usages of {oop,metadata}_Relocation::spec_for_immediate from 
C2 though, so I don't know how that works in C2-land.

/Mikael

> 
> I had a look at the Aurora chessboard and it looks like as if the bug
> recently occured on x86_32 as well. I was not yet able to reproduce it
> but will try again next week.
> 
> Thanks,
> Tobias
> 
> On 22.07.2014 20:04, Vladimir Kozlov wrote:
> > I agree with Mikael, the case without compressed oops is incorrect.
> > The problem is immP_load() and immP_no_oop_cheap() operands miss the
> > check for metadata pointer, they only check for oop. For class loading
> > loadConP_load() should be used instead of loadConP_no_oop_cheap() and
> > it should check relocation type and do what loadConP_set() does.
> > 
> > 
> > X64 seems fine. X86_64.ad use $$$emit32$src$$constant; in such case
> > (load_immP31) which is expanded to the same code as load_immP:
> > 
> > if ( opnd_array(1)->constant_reloc() != relocInfo::none ) {
> > 
> >   emit_d32_reloc(cbuf, opnd_array(1)->constant(),
> > 
> > opnd_array(1)->constant_reloc(), 0);
> > 
> > Thanks,
> > Vladimir
> > 
> > On 7/22/14 2:22 AM, Tobias Hartmann wrote:
> >> On 21.07.2014 20:59, Vladimir Kozlov wrote:
> >>> On 7/21/14 1:44 AM, Tobias Hartmann wrote:
> >>>> Vladimir, Coleen, thanks for the reviews!
> >>>> 
> >>>> On 18.07.2014 20:09, Vladimir Kozlov wrote:
> >>>>> On 7/18/14 11:02 AM, Coleen Phillimore wrote:
> >>>>>> On 7/18/14, 11:06 AM, Vladimir Kozlov wrote:
> >>>>>>> On 7/18/14 4:38 AM, Tobias Hartmann wrote:
> >>>>>>>> Hi,
> >>>>>>>> 
> >>>>>>>> I spend some more days and was finally able to implement a test
> >>>>>>>> that
> >>>>>>> 
> >>>>>>>> deterministically triggers the bug:
> >>>>>>> Why do you need to switch off compressed oops? Do you need to
> >>>>>>> switch
> >>>>>>> off compressed klass pointers too (UseCompressedClassPointers)?
> >>>>>> 
> >>>>>> CompressedOops when off turns off CompressedClassPointers.
> >>>>> 
> >>>>> You are right, I forgot that. Still the question is why switch off
> >>>>> coop?
> >>>> 
> >>>> I'm only able to reproduce the bug without compressed oops. The
> >>>> original
> >>>> bug also only reproduces with -XX:-UseCompressedOops. I tried to
> >>>> figure
> >>>> out why (on Sparc):
> >>>> 
> >>>> With compressed oops enabled, Method* metadata referencing
> >>>> 'WorkerClass'
> >>>> is added to 'doWork' in MacroAssembler::set_narrow_klass(..). In
> >>>> CodeBuffer::finalize_oop_references(..) the metadata is processed
> >>>> and an
> >>>> oop to the class loader 'URLClassLoader' is added. This oop leads
> >>>> to the
> >>>> unloading of 'doWork', hence the verification code is never executed.
> >>>> 
> >>>> I'm not sure what set_narrow_klass(..) is used for in this case. I
> >>>> assume it stores a 'WorkerClass' Klass* in a register as part of an
> >>>> optimization? Because 'doWork' potentially works on any class.
> >>>> Apparently this optimization is not performed without compressed oops.
> >>> 
> >>> I would suggest to compare 'doWork' assembler
> >>> (-XX:CompileCommand=print,TestMethodUnloading::doWork) with coop and
> >>> without it. Usually loaded into register class is used for klass
> >>> compare do guard inlining code. Or to initialize new object.
> >>> 
> >>> I don't see loading (constructing) uncompressed (whole) klass pointer
> >>> from constant in sparc.ad. It could be the reason for different
> >>> behavior. It could be loaded from constants section. But constants
> >>> section should have metadata relocation info in such case.
> >> 
> >> I did as you suggested and found the following:
> >> 
> >> During the profiling phase the class given to 'doWork' always is
> >> 'WorkerClass'. The C2 compiler therefore optimizes the compiled version
> >> to expect a 'WorkerClass'. The branch that instantiates a new object is
> >> guarded by an uncommon trap (class_check). The difference between the
> >> two versions (with and without compressed oops) is the loading of the
> >> 'WorkerClass' Klass to check if the given class is equal:
> >> 
> >> With compressed oops:
> >>     SET    narrowklass: precise klass WorkerClass:
> >> 0x00000001004a0d40:Constant:exact *,R_L1 ! compressed klass ptr
> >> 
> >>     CWBne  R_L2,R_L1,B8      ! compressed ptr  P=0.000001 C=-1.000000
> >> 
> >> Without:
> >>     SET    precise klass WorkerClass: 0x00000001004aeab0:Constant:exact
> >> 
> >> *,R_L1      ! non-oop ptr
> >> 
> >>     CXBpne R_L2,R_L1,B8     ! ptr  P=0.000001 C=-1.000000
> >> 
> >> R_L2: class given as parameter
> >> B8: location of uncommon trap
> >> 
> >> In the first case, the Klass is loaded by a 'loadConNKlass' instruction
> >> that calls MacroAssembler::set_narrow_klass(..) which then creates a
> >> metadata_Relocation for the 'WorkerClass'. This metada_Relocation is
> >> processed by CodeBuffer::finalize_oop_references(..) and an oop to
> >> 'WorkerClass' is added. This oop causes the unloading of the method.
> >> 
> >> In the second case, the Klass is loaded by a 'loadConP_no_oop_cheap'
> >> instruction that does not create a metadata_Relocation.
> >> 
> >> I don't understand why the metadata_Relocation in the first case is
> >> needed? As the test shows it is better to only unload the method if we
> >> hit the uncommon trap because we could still use other (potentially
> >> complex) branches of the method.
> >> 
> >> Thanks,
> >> Tobias
> >> 
> >>> thanks,
> >>> Vladimir
> >>> 
> >>>> Best,
> >>>> Tobias
> >>>> 
> >>>>> Vladimir
> >>>>> 
> >>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.02/
> >>>>>>> 
> >>>>>>> Very nice!
> >>>>>> 
> >>>>>> Yes, I agree.  Impressive.
> >>>>>> 
> >>>>>> The refactoring in nmethod.cpp looks good to me.  I have no further
> >>>>>> comments.
> >>>>>> Thanks!
> >>>>>> Coleen
> >>>>>> 
> >>>>>>>> @Vladimir: The test shows why we should only clean the ICs but not
> >>>>>>>> unload the nmethod if possible. The method ' doWork'
> >>>>>>>> is still valid after WorkerClass was unloaded and depending on the
> >>>>>>>> complexity of the method we should avoid unloading it.
> >>>>>>> 
> >>>>>>> Make sense.
> >>>>>>> 
> >>>>>>>> On Sparc my patch fixes the bug and leads to the nmethod not being
> >>>>>>>> unloaded. The compiled version is therefore used even
> >>>>>>>> after WorkerClass is unloaded.
> >>>>>>>> 
> >>>>>>>> On x86 the nmethod is unloaded anyway because of a dead oop.
> >>>>>>>> This is
> >>>>>>>> probably due to a slightly different implementation
> >>>>>>>> of the ICs. I'll have a closer look to see if we can improve that.
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Vladimir
> >>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> Tobias
> >>>>>>>> 
> >>>>>>>> On 16.07.2014 10:36, Tobias Hartmann wrote:
> >>>>>>>>> Sorry, forgot to answer this question:
> >>>>>>>>>> Were you able to create a small test case for it that would be
> >>>>>>>>>> useful to add?
> >>>>>>>>> 
> >>>>>>>>> Unfortunately I was not able to create a test. The bug only
> >>>>>>>>> reproduces on a particular system with a > 30 minute run
> >>>>>>>>> of runThese.
> >>>>>>>>> 
> >>>>>>>>> Best,
> >>>>>>>>> Tobias
> >>>>>>>>> 
> >>>>>>>>> On 16.07.2014 09:54, Tobias Hartmann wrote:
> >>>>>>>>>> Hi Coleen,
> >>>>>>>>>> 
> >>>>>>>>>> thanks for the review.
> >>>>>>>>>> 
> >>>>>>>>>>> *+           if (csc->is_call_to_interpreted() &&
> >>>>>>>>>>> stub_contains_dead_metadata(is_alive, csc->destination())) {*
> >>>>>>>>>>> *+             csc->set_to_clean();*
> >>>>>>>>>>> *+           }*
> >>>>>>>>>>> 
> >>>>>>>>>>> This appears in each case.  Can you fold it and the new
> >>>>>>>>>>> function
> >>>>>>>>>>> into a function like
> >>>>>>>>>>> clean_call_to_interpreted_stub(is_alive, csc)?
> >>>>>>>>>> 
> >>>>>>>>>> I folded it into the function
> >>>>>>>>>> clean_call_to_interpreter_stub(..).
> >>>>>>>>>> 
> >>>>>>>>>> New webrev:
> >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.01/
> >>>>>>>>>> 
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Tobias
> >>>>>>>>>> 
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Coleen
> >>>>>>>>>>> 
> >>>>>>>>>>>> So before the permgen removal embedded method* were oops and
> >>>>>>>>>>>> they
> >>>>>>>>>>>> were processed in relocInfo::oop_type loop.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> May be instead of specializing opt_virtual_call_type and
> >>>>>>>>>>>> static_call_type call site you can simple add a loop for
> >>>>>>>>>>>> relocInfo::metadata_type (similar to oop_type loop)?
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Vladimir
> >>>>>>>>>>>> 
> >>>>>>>>>>>> On 7/14/14 4:56 AM, Tobias Hartmann wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> please review the following patch for JDK-8029443.
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8029443
> >>>>>>>>>>>>> Webrev:
> >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.00/
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> *Problem*
> >>>>>>>>>>>>> After the tracing/marking phase of GC,
> >>>>>>>>>>>>> nmethod::do_unloading(..)
> >>>>>>>>>>>>> checks
> >>>>>>>>>>>>> if a nmethod can be unloaded because it contains dead
> >>>>>>>>>>>>> oops. If
> >>>>>>>>>>>>> class
> >>>>>>>>>>>>> unloading occurred we additionally clear all ICs where the
> >>>>>>>>>>>>> cached
> >>>>>>>>>>>>> metadata refers to an unloaded klass or method. If the
> >>>>>>>>>>>>> nmethod
> >>>>>>>>>>>>> is not
> >>>>>>>>>>>>> unloaded, nmethod::verify_metadata_loaders(..) finally
> >>>>>>>>>>>>> checks if
> >>>>>>>>>>>>> all
> >>>>>>>>>>>>> metadata is alive. The assert in CheckClass::check_class
> >>>>>>>>>>>>> fails
> >>>>>>>>>>>>> because
> >>>>>>>>>>>>> the nmethod contains Method* metadata corresponding to a dead
> >>>>>>>>>>>>> Klass.
> >>>>>>>>>>>>> The Method* belongs to a to-interpreter stub [1] of an
> >>>>>>>>>>>>> optimized
> >>>>>>>>>>>>> compiled IC. Normally we clear those stubs prior to
> >>>>>>>>>>>>> verification to
> >>>>>>>>>>>>> avoid dangling references to Method* [2], but only if the
> >>>>>>>>>>>>> stub
> >>>>>>>>>>>>> is not in
> >>>>>>>>>>>>> use, i.e. if the IC is not in to-interpreted mode. In this
> >>>>>>>>>>>>> case the
> >>>>>>>>>>>>> to-interpreter stub may be executed and hand a stale Method*
> >>>>>>>>>>>>> to the
> >>>>>>>>>>>>> interpreter.
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> *Solution
> >>>>>>>>>>>>> *The implementation of nmethod::do_unloading(..) is
> >>>>>>>>>>>>> changed to
> >>>>>>>>>>>>> clean
> >>>>>>>>>>>>> compiled ICs and compiled static calls if they call into a
> >>>>>>>>>>>>> to-interpreter stub that references dead Method* metadata.
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> The patch was affected by the G1 class unloading changes
> >>>>>>>>>>>>> (JDK-8048248)
> >>>>>>>>>>>>> because the method nmethod::do_unloading_parallel(..) was
> >>>>>>>>>>>>> added. I
> >>>>>>>>>>>>> adapted the implementation as well.
> >>>>>>>>>>>>> *
> >>>>>>>>>>>>> Testing
> >>>>>>>>>>>>> *Failing test (runThese)
> >>>>>>>>>>>>> JPRT
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Tobias
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> [1] see CompiledStaticCall::emit_to_interp_stub(..)
> >>>>>>>>>>>>> [2] see nmethod::verify_metadata_loaders(..),
> >>>>>>>>>>>>> static_stub_reloc()->clear_inline_cache() clears the stub



More information about the hotspot-dev mailing list