[9] RFR(S): 8029443: 'assert(klass->is_loader_alive(_is_alive)) failed: must be alive' during VM_CollectForMetadataAllocation

Tobias Hartmann tobias.hartmann at oracle.com
Tue Aug 5 08:56:24 UTC 2014


Mikael, Vladimir,

thanks for the review. Please see comments inline.

On 25.07.2014 16:13, Mikael Gerdin wrote:
> There shouldn't be anything stopping the Klass* from being emitted as 
> an immediate just as long as the appropriate relocation entry is 
> created. We don't need to update the immediate in the instruction 
> stream for the Klass* since we don't move klasses any more. 

Right, I changed the implementation of 'loadConP_no_oop_cheap' to create 
a relocation entry if the loaded pointer references (Klass) metadata.

On 25.07.2014 19:27, Vladimir Kozlov wrote:
> Tobias, you also need to check for general metadata and not just 
> klassptr - is_metadataptr(). Unfortunately they not inherited from 
> one-another in C2 types.

As Mikael mentioned we can emit the Klass* as an immediate and do not 
have to use 'loadConP_load'. I removed the changes to 
'nmethod::do_unloading' and 'nmethod::do_unloading_parallel' because we 
now unload the nmethod anyway.

New webrev: http://cr.openjdk.java.net/~thartmann/8029443/webrev.03/

As I said, the bug rarely occurs on x86-32 as well, which means that 
there probably is a similar bug on x86. Unfortunately I'm not able to 
reproduce it. I'd suggest we fix the Sparc related problem with this bug 
and open a new one for the x86 related failures.

Thanks,
Tobias

>
> Thanks,
> Vladimir
>
> On 7/25/14 6:54 AM, Tobias Hartmann wrote:
>> Mikael, Vladimir, thanks for the review.
>>
>> The problem is indeed caused by a missing check for a metadata pointer
>> in sparc.ad.
>>
>> Adding 'n->bottom_type()->isa_klassptr()' checks to immP_load() and
>> immP_no_oop_cheap() fixes the problem. The Klass pointer is then loaded
>> from the constant table (loadConP_load()) and a metadata relocation is
>> added by Compile::ConstantTable::emit().
>>
>> I had a look at the Aurora chessboard and it looks like as if the bug
>> recently occured on x86_32 as well. I was not yet able to reproduce it
>> but will try again next week.
>>
>> Thanks,
>> Tobias
>>
>> On 22.07.2014 20:04, Vladimir Kozlov wrote:
>>> I agree with Mikael, the case without compressed oops is incorrect.
>>> The problem is immP_load() and immP_no_oop_cheap() operands miss the
>>> check for metadata pointer, they only check for oop. For class loading
>>> loadConP_load() should be used instead of loadConP_no_oop_cheap() and
>>> it should check relocation type and do what loadConP_set() does.
>>>
>>>
>>> X64 seems fine. X86_64.ad use $$$emit32$src$$constant; in such case
>>> (load_immP31) which is expanded to the same code as load_immP:
>>>
>>> if ( opnd_array(1)->constant_reloc() != relocInfo::none ) {
>>>   emit_d32_reloc(cbuf, opnd_array(1)->constant(),
>>> opnd_array(1)->constant_reloc(), 0);
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/22/14 2:22 AM, Tobias Hartmann wrote:
>>>> On 21.07.2014 20:59, Vladimir Kozlov wrote:
>>>>> On 7/21/14 1:44 AM, Tobias Hartmann wrote:
>>>>>> Vladimir, Coleen, thanks for the reviews!
>>>>>>
>>>>>> On 18.07.2014 20:09, Vladimir Kozlov wrote:
>>>>>>> On 7/18/14 11:02 AM, Coleen Phillimore wrote:
>>>>>>>>
>>>>>>>> On 7/18/14, 11:06 AM, Vladimir Kozlov wrote:
>>>>>>>>> On 7/18/14 4:38 AM, Tobias Hartmann wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I spend some more days and was finally able to implement a test
>>>>>>>>>> that
>>>>>>>>>> deterministically triggers the bug:
>>>>>>>>>
>>>>>>>>> Why do you need to switch off compressed oops? Do you need to
>>>>>>>>> switch
>>>>>>>>> off compressed klass pointers too (UseCompressedClassPointers)?
>>>>>>>>
>>>>>>>> CompressedOops when off turns off CompressedClassPointers.
>>>>>>>
>>>>>>> You are right, I forgot that. Still the question is why switch off
>>>>>>> coop?
>>>>>>
>>>>>> I'm only able to reproduce the bug without compressed oops. The
>>>>>> original
>>>>>> bug also only reproduces with -XX:-UseCompressedOops. I tried to
>>>>>> figure
>>>>>> out why (on Sparc):
>>>>>>
>>>>>> With compressed oops enabled, Method* metadata referencing
>>>>>> 'WorkerClass'
>>>>>> is added to 'doWork' in MacroAssembler::set_narrow_klass(..). In
>>>>>> CodeBuffer::finalize_oop_references(..) the metadata is processed
>>>>>> and an
>>>>>> oop to the class loader 'URLClassLoader' is added. This oop leads
>>>>>> to the
>>>>>> unloading of 'doWork', hence the verification code is never 
>>>>>> executed.
>>>>>>
>>>>>> I'm not sure what set_narrow_klass(..) is used for in this case. I
>>>>>> assume it stores a 'WorkerClass' Klass* in a register as part of an
>>>>>> optimization? Because 'doWork' potentially works on any class.
>>>>>> Apparently this optimization is not performed without compressed 
>>>>>> oops.
>>>>>
>>>>> I would suggest to compare 'doWork' assembler
>>>>> (-XX:CompileCommand=print,TestMethodUnloading::doWork) with coop and
>>>>> without it. Usually loaded into register class is used for klass
>>>>> compare do guard inlining code. Or to initialize new object.
>>>>>
>>>>> I don't see loading (constructing) uncompressed (whole) klass pointer
>>>>> from constant in sparc.ad. It could be the reason for different
>>>>> behavior. It could be loaded from constants section. But constants
>>>>> section should have metadata relocation info in such case.
>>>>
>>>> I did as you suggested and found the following:
>>>>
>>>> During the profiling phase the class given to 'doWork' always is
>>>> 'WorkerClass'. The C2 compiler therefore optimizes the compiled 
>>>> version
>>>> to expect a 'WorkerClass'. The branch that instantiates a new 
>>>> object is
>>>> guarded by an uncommon trap (class_check). The difference between the
>>>> two versions (with and without compressed oops) is the loading of the
>>>> 'WorkerClass' Klass to check if the given class is equal:
>>>>
>>>> With compressed oops:
>>>>     SET    narrowklass: precise klass WorkerClass:
>>>> 0x00000001004a0d40:Constant:exact *,R_L1 ! compressed klass ptr
>>>>     CWBne  R_L2,R_L1,B8      ! compressed ptr  P=0.000001 C=-1.000000
>>>>
>>>> Without:
>>>>     SET    precise klass WorkerClass: 
>>>> 0x00000001004aeab0:Constant:exact
>>>> *,R_L1      ! non-oop ptr
>>>>     CXBpne R_L2,R_L1,B8     ! ptr  P=0.000001 C=-1.000000
>>>>
>>>> R_L2: class given as parameter
>>>> B8: location of uncommon trap
>>>>
>>>> In the first case, the Klass is loaded by a 'loadConNKlass' 
>>>> instruction
>>>> that calls MacroAssembler::set_narrow_klass(..) which then creates a
>>>> metadata_Relocation for the 'WorkerClass'. This metada_Relocation is
>>>> processed by CodeBuffer::finalize_oop_references(..) and an oop to
>>>> 'WorkerClass' is added. This oop causes the unloading of the method.
>>>>
>>>> In the second case, the Klass is loaded by a 'loadConP_no_oop_cheap'
>>>> instruction that does not create a metadata_Relocation.
>>>>
>>>> I don't understand why the metadata_Relocation in the first case is
>>>> needed? As the test shows it is better to only unload the method if we
>>>> hit the uncommon trap because we could still use other (potentially
>>>> complex) branches of the method.
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>>>
>>>>> thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Tobias
>>>>>>
>>>>>>>
>>>>>>> Vladimir
>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.02/
>>>>>>>>>
>>>>>>>>> Very nice!
>>>>>>>>
>>>>>>>> Yes, I agree.  Impressive.
>>>>>>>>
>>>>>>>> The refactoring in nmethod.cpp looks good to me.  I have no 
>>>>>>>> further
>>>>>>>> comments.
>>>>>>>> Thanks!
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> @Vladimir: The test shows why we should only clean the ICs 
>>>>>>>>>> but not
>>>>>>>>>> unload the nmethod if possible. The method ' doWork'
>>>>>>>>>> is still valid after WorkerClass was unloaded and depending 
>>>>>>>>>> on the
>>>>>>>>>> complexity of the method we should avoid unloading it.
>>>>>>>>>
>>>>>>>>> Make sense.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sparc my patch fixes the bug and leads to the nmethod not 
>>>>>>>>>> being
>>>>>>>>>> unloaded. The compiled version is therefore used even
>>>>>>>>>> after WorkerClass is unloaded.
>>>>>>>>>>
>>>>>>>>>> On x86 the nmethod is unloaded anyway because of a dead oop.
>>>>>>>>>> This is
>>>>>>>>>> probably due to a slightly different implementation
>>>>>>>>>> of the ICs. I'll have a closer look to see if we can improve 
>>>>>>>>>> that.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Tobias
>>>>>>>>>>
>>>>>>>>>> On 16.07.2014 10:36, Tobias Hartmann wrote:
>>>>>>>>>>> Sorry, forgot to answer this question:
>>>>>>>>>>>> Were you able to create a small test case for it that would be
>>>>>>>>>>>> useful to add?
>>>>>>>>>>> Unfortunately I was not able to create a test. The bug only
>>>>>>>>>>> reproduces on a particular system with a > 30 minute run
>>>>>>>>>>> of runThese.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Tobias
>>>>>>>>>>>
>>>>>>>>>>> On 16.07.2014 09:54, Tobias Hartmann wrote:
>>>>>>>>>>>> Hi Coleen,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks for the review.
>>>>>>>>>>>>> *+           if (csc->is_call_to_interpreted() &&
>>>>>>>>>>>>> stub_contains_dead_metadata(is_alive, csc->destination())) {*
>>>>>>>>>>>>> *+             csc->set_to_clean();*
>>>>>>>>>>>>> *+           }*
>>>>>>>>>>>>>
>>>>>>>>>>>>> This appears in each case.  Can you fold it and the new
>>>>>>>>>>>>> function
>>>>>>>>>>>>> into a function like
>>>>>>>>>>>>> clean_call_to_interpreted_stub(is_alive, csc)?
>>>>>>>>>>>>
>>>>>>>>>>>> I folded it into the function
>>>>>>>>>>>> clean_call_to_interpreter_stub(..).
>>>>>>>>>>>>
>>>>>>>>>>>> New webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.01/
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Tobias
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Coleen
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So before the permgen removal embedded method* were oops and
>>>>>>>>>>>>>> they
>>>>>>>>>>>>>> were processed in relocInfo::oop_type loop.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> May be instead of specializing opt_virtual_call_type and
>>>>>>>>>>>>>> static_call_type call site you can simple add a loop for
>>>>>>>>>>>>>> relocInfo::metadata_type (similar to oop_type loop)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 7/14/14 4:56 AM, Tobias Hartmann wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> please review the following patch for JDK-8029443.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8029443
>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.00/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Problem*
>>>>>>>>>>>>>>> After the tracing/marking phase of GC,
>>>>>>>>>>>>>>> nmethod::do_unloading(..)
>>>>>>>>>>>>>>> checks
>>>>>>>>>>>>>>> if a nmethod can be unloaded because it contains dead
>>>>>>>>>>>>>>> oops. If
>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>> unloading occurred we additionally clear all ICs where the
>>>>>>>>>>>>>>> cached
>>>>>>>>>>>>>>> metadata refers to an unloaded klass or method. If the
>>>>>>>>>>>>>>> nmethod
>>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>> unloaded, nmethod::verify_metadata_loaders(..) finally
>>>>>>>>>>>>>>> checks if
>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>> metadata is alive. The assert in CheckClass::check_class
>>>>>>>>>>>>>>> fails
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>> the nmethod contains Method* metadata corresponding to a 
>>>>>>>>>>>>>>> dead
>>>>>>>>>>>>>>> Klass.
>>>>>>>>>>>>>>> The Method* belongs to a to-interpreter stub [1] of an
>>>>>>>>>>>>>>> optimized
>>>>>>>>>>>>>>> compiled IC. Normally we clear those stubs prior to
>>>>>>>>>>>>>>> verification to
>>>>>>>>>>>>>>> avoid dangling references to Method* [2], but only if the
>>>>>>>>>>>>>>> stub
>>>>>>>>>>>>>>> is not in
>>>>>>>>>>>>>>> use, i.e. if the IC is not in to-interpreted mode. In this
>>>>>>>>>>>>>>> case the
>>>>>>>>>>>>>>> to-interpreter stub may be executed and hand a stale 
>>>>>>>>>>>>>>> Method*
>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>> interpreter.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Solution
>>>>>>>>>>>>>>> *The implementation of nmethod::do_unloading(..) is
>>>>>>>>>>>>>>> changed to
>>>>>>>>>>>>>>> clean
>>>>>>>>>>>>>>> compiled ICs and compiled static calls if they call into a
>>>>>>>>>>>>>>> to-interpreter stub that references dead Method* metadata.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The patch was affected by the G1 class unloading changes
>>>>>>>>>>>>>>> (JDK-8048248)
>>>>>>>>>>>>>>> because the method nmethod::do_unloading_parallel(..) was
>>>>>>>>>>>>>>> added. I
>>>>>>>>>>>>>>> adapted the implementation as well.
>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>> Testing
>>>>>>>>>>>>>>> *Failing test (runThese)
>>>>>>>>>>>>>>> JPRT
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Tobias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] see CompiledStaticCall::emit_to_interp_stub(..)
>>>>>>>>>>>>>>> [2] see nmethod::verify_metadata_loaders(..),
>>>>>>>>>>>>>>> static_stub_reloc()->clear_inline_cache() clears the stub
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>



More information about the hotspot-dev mailing list