[9] RFR(S): 8029443: 'assert(klass->is_loader_alive(_is_alive)) failed: must be alive' during VM_CollectForMetadataAllocation

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Aug 5 18:03:55 UTC 2014


Changes looks ggod. Thank you for solving this hard problem.

Thanks,
Vladimir

On 8/5/14 1:56 AM, Tobias Hartmann wrote:
> Mikael, Vladimir,
>
> thanks for the review. Please see comments inline.
>
> On 25.07.2014 16:13, Mikael Gerdin wrote:
>> There shouldn't be anything stopping the Klass* from being emitted as
>> an immediate just as long as the appropriate relocation entry is
>> created. We don't need to update the immediate in the instruction
>> stream for the Klass* since we don't move klasses any more.
>
> Right, I changed the implementation of 'loadConP_no_oop_cheap' to create
> a relocation entry if the loaded pointer references (Klass) metadata.
>
> On 25.07.2014 19:27, Vladimir Kozlov wrote:
>> Tobias, you also need to check for general metadata and not just
>> klassptr - is_metadataptr(). Unfortunately they not inherited from
>> one-another in C2 types.
>
> As Mikael mentioned we can emit the Klass* as an immediate and do not
> have to use 'loadConP_load'. I removed the changes to
> 'nmethod::do_unloading' and 'nmethod::do_unloading_parallel' because we
> now unload the nmethod anyway.
>
> New webrev: http://cr.openjdk.java.net/~thartmann/8029443/webrev.03/
>
> As I said, the bug rarely occurs on x86-32 as well, which means that
> there probably is a similar bug on x86. Unfortunately I'm not able to
> reproduce it. I'd suggest we fix the Sparc related problem with this bug
> and open a new one for the x86 related failures.
>
> Thanks,
> Tobias
>
>>
>> Thanks,
>> Vladimir
>>
>> On 7/25/14 6:54 AM, Tobias Hartmann wrote:
>>> Mikael, Vladimir, thanks for the review.
>>>
>>> The problem is indeed caused by a missing check for a metadata pointer
>>> in sparc.ad.
>>>
>>> Adding 'n->bottom_type()->isa_klassptr()' checks to immP_load() and
>>> immP_no_oop_cheap() fixes the problem. The Klass pointer is then loaded
>>> from the constant table (loadConP_load()) and a metadata relocation is
>>> added by Compile::ConstantTable::emit().
>>>
>>> I had a look at the Aurora chessboard and it looks like as if the bug
>>> recently occured on x86_32 as well. I was not yet able to reproduce it
>>> but will try again next week.
>>>
>>> Thanks,
>>> Tobias
>>>
>>> On 22.07.2014 20:04, Vladimir Kozlov wrote:
>>>> I agree with Mikael, the case without compressed oops is incorrect.
>>>> The problem is immP_load() and immP_no_oop_cheap() operands miss the
>>>> check for metadata pointer, they only check for oop. For class loading
>>>> loadConP_load() should be used instead of loadConP_no_oop_cheap() and
>>>> it should check relocation type and do what loadConP_set() does.
>>>>
>>>>
>>>> X64 seems fine. X86_64.ad use $$$emit32$src$$constant; in such case
>>>> (load_immP31) which is expanded to the same code as load_immP:
>>>>
>>>> if ( opnd_array(1)->constant_reloc() != relocInfo::none ) {
>>>>   emit_d32_reloc(cbuf, opnd_array(1)->constant(),
>>>> opnd_array(1)->constant_reloc(), 0);
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 7/22/14 2:22 AM, Tobias Hartmann wrote:
>>>>> On 21.07.2014 20:59, Vladimir Kozlov wrote:
>>>>>> On 7/21/14 1:44 AM, Tobias Hartmann wrote:
>>>>>>> Vladimir, Coleen, thanks for the reviews!
>>>>>>>
>>>>>>> On 18.07.2014 20:09, Vladimir Kozlov wrote:
>>>>>>>> On 7/18/14 11:02 AM, Coleen Phillimore wrote:
>>>>>>>>>
>>>>>>>>> On 7/18/14, 11:06 AM, Vladimir Kozlov wrote:
>>>>>>>>>> On 7/18/14 4:38 AM, Tobias Hartmann wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I spend some more days and was finally able to implement a test
>>>>>>>>>>> that
>>>>>>>>>>> deterministically triggers the bug:
>>>>>>>>>>
>>>>>>>>>> Why do you need to switch off compressed oops? Do you need to
>>>>>>>>>> switch
>>>>>>>>>> off compressed klass pointers too (UseCompressedClassPointers)?
>>>>>>>>>
>>>>>>>>> CompressedOops when off turns off CompressedClassPointers.
>>>>>>>>
>>>>>>>> You are right, I forgot that. Still the question is why switch off
>>>>>>>> coop?
>>>>>>>
>>>>>>> I'm only able to reproduce the bug without compressed oops. The
>>>>>>> original
>>>>>>> bug also only reproduces with -XX:-UseCompressedOops. I tried to
>>>>>>> figure
>>>>>>> out why (on Sparc):
>>>>>>>
>>>>>>> With compressed oops enabled, Method* metadata referencing
>>>>>>> 'WorkerClass'
>>>>>>> is added to 'doWork' in MacroAssembler::set_narrow_klass(..). In
>>>>>>> CodeBuffer::finalize_oop_references(..) the metadata is processed
>>>>>>> and an
>>>>>>> oop to the class loader 'URLClassLoader' is added. This oop leads
>>>>>>> to the
>>>>>>> unloading of 'doWork', hence the verification code is never
>>>>>>> executed.
>>>>>>>
>>>>>>> I'm not sure what set_narrow_klass(..) is used for in this case. I
>>>>>>> assume it stores a 'WorkerClass' Klass* in a register as part of an
>>>>>>> optimization? Because 'doWork' potentially works on any class.
>>>>>>> Apparently this optimization is not performed without compressed
>>>>>>> oops.
>>>>>>
>>>>>> I would suggest to compare 'doWork' assembler
>>>>>> (-XX:CompileCommand=print,TestMethodUnloading::doWork) with coop and
>>>>>> without it. Usually loaded into register class is used for klass
>>>>>> compare do guard inlining code. Or to initialize new object.
>>>>>>
>>>>>> I don't see loading (constructing) uncompressed (whole) klass pointer
>>>>>> from constant in sparc.ad. It could be the reason for different
>>>>>> behavior. It could be loaded from constants section. But constants
>>>>>> section should have metadata relocation info in such case.
>>>>>
>>>>> I did as you suggested and found the following:
>>>>>
>>>>> During the profiling phase the class given to 'doWork' always is
>>>>> 'WorkerClass'. The C2 compiler therefore optimizes the compiled
>>>>> version
>>>>> to expect a 'WorkerClass'. The branch that instantiates a new
>>>>> object is
>>>>> guarded by an uncommon trap (class_check). The difference between the
>>>>> two versions (with and without compressed oops) is the loading of the
>>>>> 'WorkerClass' Klass to check if the given class is equal:
>>>>>
>>>>> With compressed oops:
>>>>>     SET    narrowklass: precise klass WorkerClass:
>>>>> 0x00000001004a0d40:Constant:exact *,R_L1 ! compressed klass ptr
>>>>>     CWBne  R_L2,R_L1,B8      ! compressed ptr  P=0.000001 C=-1.000000
>>>>>
>>>>> Without:
>>>>>     SET    precise klass WorkerClass:
>>>>> 0x00000001004aeab0:Constant:exact
>>>>> *,R_L1      ! non-oop ptr
>>>>>     CXBpne R_L2,R_L1,B8     ! ptr  P=0.000001 C=-1.000000
>>>>>
>>>>> R_L2: class given as parameter
>>>>> B8: location of uncommon trap
>>>>>
>>>>> In the first case, the Klass is loaded by a 'loadConNKlass'
>>>>> instruction
>>>>> that calls MacroAssembler::set_narrow_klass(..) which then creates a
>>>>> metadata_Relocation for the 'WorkerClass'. This metada_Relocation is
>>>>> processed by CodeBuffer::finalize_oop_references(..) and an oop to
>>>>> 'WorkerClass' is added. This oop causes the unloading of the method.
>>>>>
>>>>> In the second case, the Klass is loaded by a 'loadConP_no_oop_cheap'
>>>>> instruction that does not create a metadata_Relocation.
>>>>>
>>>>> I don't understand why the metadata_Relocation in the first case is
>>>>> needed? As the test shows it is better to only unload the method if we
>>>>> hit the uncommon trap because we could still use other (potentially
>>>>> complex) branches of the method.
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Tobias
>>>>>>>
>>>>>>>>
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.02/
>>>>>>>>>>
>>>>>>>>>> Very nice!
>>>>>>>>>
>>>>>>>>> Yes, I agree.  Impressive.
>>>>>>>>>
>>>>>>>>> The refactoring in nmethod.cpp looks good to me.  I have no
>>>>>>>>> further
>>>>>>>>> comments.
>>>>>>>>> Thanks!
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> @Vladimir: The test shows why we should only clean the ICs
>>>>>>>>>>> but not
>>>>>>>>>>> unload the nmethod if possible. The method ' doWork'
>>>>>>>>>>> is still valid after WorkerClass was unloaded and depending
>>>>>>>>>>> on the
>>>>>>>>>>> complexity of the method we should avoid unloading it.
>>>>>>>>>>
>>>>>>>>>> Make sense.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sparc my patch fixes the bug and leads to the nmethod not
>>>>>>>>>>> being
>>>>>>>>>>> unloaded. The compiled version is therefore used even
>>>>>>>>>>> after WorkerClass is unloaded.
>>>>>>>>>>>
>>>>>>>>>>> On x86 the nmethod is unloaded anyway because of a dead oop.
>>>>>>>>>>> This is
>>>>>>>>>>> probably due to a slightly different implementation
>>>>>>>>>>> of the ICs. I'll have a closer look to see if we can improve
>>>>>>>>>>> that.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Tobias
>>>>>>>>>>>
>>>>>>>>>>> On 16.07.2014 10:36, Tobias Hartmann wrote:
>>>>>>>>>>>> Sorry, forgot to answer this question:
>>>>>>>>>>>>> Were you able to create a small test case for it that would be
>>>>>>>>>>>>> useful to add?
>>>>>>>>>>>> Unfortunately I was not able to create a test. The bug only
>>>>>>>>>>>> reproduces on a particular system with a > 30 minute run
>>>>>>>>>>>> of runThese.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Tobias
>>>>>>>>>>>>
>>>>>>>>>>>> On 16.07.2014 09:54, Tobias Hartmann wrote:
>>>>>>>>>>>>> Hi Coleen,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for the review.
>>>>>>>>>>>>>> *+           if (csc->is_call_to_interpreted() &&
>>>>>>>>>>>>>> stub_contains_dead_metadata(is_alive, csc->destination())) {*
>>>>>>>>>>>>>> *+             csc->set_to_clean();*
>>>>>>>>>>>>>> *+           }*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This appears in each case.  Can you fold it and the new
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>> into a function like
>>>>>>>>>>>>>> clean_call_to_interpreted_stub(is_alive, csc)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I folded it into the function
>>>>>>>>>>>>> clean_call_to_interpreter_stub(..).
>>>>>>>>>>>>>
>>>>>>>>>>>>> New webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.01/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Tobias
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Coleen
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So before the permgen removal embedded method* were oops and
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>> were processed in relocInfo::oop_type loop.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> May be instead of specializing opt_virtual_call_type and
>>>>>>>>>>>>>>> static_call_type call site you can simple add a loop for
>>>>>>>>>>>>>>> relocInfo::metadata_type (similar to oop_type loop)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 7/14/14 4:56 AM, Tobias Hartmann wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> please review the following patch for JDK-8029443.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8029443
>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.00/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Problem*
>>>>>>>>>>>>>>>> After the tracing/marking phase of GC,
>>>>>>>>>>>>>>>> nmethod::do_unloading(..)
>>>>>>>>>>>>>>>> checks
>>>>>>>>>>>>>>>> if a nmethod can be unloaded because it contains dead
>>>>>>>>>>>>>>>> oops. If
>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>> unloading occurred we additionally clear all ICs where the
>>>>>>>>>>>>>>>> cached
>>>>>>>>>>>>>>>> metadata refers to an unloaded klass or method. If the
>>>>>>>>>>>>>>>> nmethod
>>>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>> unloaded, nmethod::verify_metadata_loaders(..) finally
>>>>>>>>>>>>>>>> checks if
>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>> metadata is alive. The assert in CheckClass::check_class
>>>>>>>>>>>>>>>> fails
>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>> the nmethod contains Method* metadata corresponding to a
>>>>>>>>>>>>>>>> dead
>>>>>>>>>>>>>>>> Klass.
>>>>>>>>>>>>>>>> The Method* belongs to a to-interpreter stub [1] of an
>>>>>>>>>>>>>>>> optimized
>>>>>>>>>>>>>>>> compiled IC. Normally we clear those stubs prior to
>>>>>>>>>>>>>>>> verification to
>>>>>>>>>>>>>>>> avoid dangling references to Method* [2], but only if the
>>>>>>>>>>>>>>>> stub
>>>>>>>>>>>>>>>> is not in
>>>>>>>>>>>>>>>> use, i.e. if the IC is not in to-interpreted mode. In this
>>>>>>>>>>>>>>>> case the
>>>>>>>>>>>>>>>> to-interpreter stub may be executed and hand a stale
>>>>>>>>>>>>>>>> Method*
>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>> interpreter.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Solution
>>>>>>>>>>>>>>>> *The implementation of nmethod::do_unloading(..) is
>>>>>>>>>>>>>>>> changed to
>>>>>>>>>>>>>>>> clean
>>>>>>>>>>>>>>>> compiled ICs and compiled static calls if they call into a
>>>>>>>>>>>>>>>> to-interpreter stub that references dead Method* metadata.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The patch was affected by the G1 class unloading changes
>>>>>>>>>>>>>>>> (JDK-8048248)
>>>>>>>>>>>>>>>> because the method nmethod::do_unloading_parallel(..) was
>>>>>>>>>>>>>>>> added. I
>>>>>>>>>>>>>>>> adapted the implementation as well.
>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>> Testing
>>>>>>>>>>>>>>>> *Failing test (runThese)
>>>>>>>>>>>>>>>> JPRT
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Tobias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] see CompiledStaticCall::emit_to_interp_stub(..)
>>>>>>>>>>>>>>>> [2] see nmethod::verify_metadata_loaders(..),
>>>>>>>>>>>>>>>> static_stub_reloc()->clear_inline_cache() clears the stub
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>


More information about the hotspot-dev mailing list