[9] RFR(S): 8029443: 'assert(klass->is_loader_alive(_is_alive)) failed: must be alive' during VM_CollectForMetadataAllocation

Tobias Hartmann tobias.hartmann at oracle.com
Fri Jul 25 13:54:38 UTC 2014


Mikael, Vladimir, thanks for the review.

The problem is indeed caused by a missing check for a metadata pointer 
in sparc.ad.

Adding 'n->bottom_type()->isa_klassptr()' checks to immP_load() and 
immP_no_oop_cheap() fixes the problem. The Klass pointer is then loaded 
from the constant table (loadConP_load()) and a metadata relocation is 
added by Compile::ConstantTable::emit().

I had a look at the Aurora chessboard and it looks like as if the bug 
recently occured on x86_32 as well. I was not yet able to reproduce it 
but will try again next week.

Thanks,
Tobias

On 22.07.2014 20:04, Vladimir Kozlov wrote:
> I agree with Mikael, the case without compressed oops is incorrect.
> The problem is immP_load() and immP_no_oop_cheap() operands miss the 
> check for metadata pointer, they only check for oop. For class loading 
> loadConP_load() should be used instead of loadConP_no_oop_cheap() and 
> it should check relocation type and do what loadConP_set() does.
>
>
> X64 seems fine. X86_64.ad use $$$emit32$src$$constant; in such case 
> (load_immP31) which is expanded to the same code as load_immP:
>
> if ( opnd_array(1)->constant_reloc() != relocInfo::none ) {
>   emit_d32_reloc(cbuf, opnd_array(1)->constant(), 
> opnd_array(1)->constant_reloc(), 0);
>
> Thanks,
> Vladimir
>
> On 7/22/14 2:22 AM, Tobias Hartmann wrote:
>> On 21.07.2014 20:59, Vladimir Kozlov wrote:
>>> On 7/21/14 1:44 AM, Tobias Hartmann wrote:
>>>> Vladimir, Coleen, thanks for the reviews!
>>>>
>>>> On 18.07.2014 20:09, Vladimir Kozlov wrote:
>>>>> On 7/18/14 11:02 AM, Coleen Phillimore wrote:
>>>>>>
>>>>>> On 7/18/14, 11:06 AM, Vladimir Kozlov wrote:
>>>>>>> On 7/18/14 4:38 AM, Tobias Hartmann wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I spend some more days and was finally able to implement a test 
>>>>>>>> that
>>>>>>>> deterministically triggers the bug:
>>>>>>>
>>>>>>> Why do you need to switch off compressed oops? Do you need to 
>>>>>>> switch
>>>>>>> off compressed klass pointers too (UseCompressedClassPointers)?
>>>>>>
>>>>>> CompressedOops when off turns off CompressedClassPointers.
>>>>>
>>>>> You are right, I forgot that. Still the question is why switch off
>>>>> coop?
>>>>
>>>> I'm only able to reproduce the bug without compressed oops. The 
>>>> original
>>>> bug also only reproduces with -XX:-UseCompressedOops. I tried to 
>>>> figure
>>>> out why (on Sparc):
>>>>
>>>> With compressed oops enabled, Method* metadata referencing 
>>>> 'WorkerClass'
>>>> is added to 'doWork' in MacroAssembler::set_narrow_klass(..). In
>>>> CodeBuffer::finalize_oop_references(..) the metadata is processed 
>>>> and an
>>>> oop to the class loader 'URLClassLoader' is added. This oop leads 
>>>> to the
>>>> unloading of 'doWork', hence the verification code is never executed.
>>>>
>>>> I'm not sure what set_narrow_klass(..) is used for in this case. I
>>>> assume it stores a 'WorkerClass' Klass* in a register as part of an
>>>> optimization? Because 'doWork' potentially works on any class.
>>>> Apparently this optimization is not performed without compressed oops.
>>>
>>> I would suggest to compare 'doWork' assembler
>>> (-XX:CompileCommand=print,TestMethodUnloading::doWork) with coop and
>>> without it. Usually loaded into register class is used for klass
>>> compare do guard inlining code. Or to initialize new object.
>>>
>>> I don't see loading (constructing) uncompressed (whole) klass pointer
>>> from constant in sparc.ad. It could be the reason for different
>>> behavior. It could be loaded from constants section. But constants
>>> section should have metadata relocation info in such case.
>>
>> I did as you suggested and found the following:
>>
>> During the profiling phase the class given to 'doWork' always is
>> 'WorkerClass'. The C2 compiler therefore optimizes the compiled version
>> to expect a 'WorkerClass'. The branch that instantiates a new object is
>> guarded by an uncommon trap (class_check). The difference between the
>> two versions (with and without compressed oops) is the loading of the
>> 'WorkerClass' Klass to check if the given class is equal:
>>
>> With compressed oops:
>>     SET    narrowklass: precise klass WorkerClass:
>> 0x00000001004a0d40:Constant:exact *,R_L1 ! compressed klass ptr
>>     CWBne  R_L2,R_L1,B8      ! compressed ptr  P=0.000001 C=-1.000000
>>
>> Without:
>>     SET    precise klass WorkerClass: 0x00000001004aeab0:Constant:exact
>> *,R_L1      ! non-oop ptr
>>     CXBpne R_L2,R_L1,B8     ! ptr  P=0.000001 C=-1.000000
>>
>> R_L2: class given as parameter
>> B8: location of uncommon trap
>>
>> In the first case, the Klass is loaded by a 'loadConNKlass' instruction
>> that calls MacroAssembler::set_narrow_klass(..) which then creates a
>> metadata_Relocation for the 'WorkerClass'. This metada_Relocation is
>> processed by CodeBuffer::finalize_oop_references(..) and an oop to
>> 'WorkerClass' is added. This oop causes the unloading of the method.
>>
>> In the second case, the Klass is loaded by a 'loadConP_no_oop_cheap'
>> instruction that does not create a metadata_Relocation.
>>
>> I don't understand why the metadata_Relocation in the first case is
>> needed? As the test shows it is better to only unload the method if we
>> hit the uncommon trap because we could still use other (potentially
>> complex) branches of the method.
>>
>> Thanks,
>> Tobias
>>
>>>
>>> thanks,
>>> Vladimir
>>>
>>>>
>>>> Best,
>>>> Tobias
>>>>
>>>>>
>>>>> Vladimir
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.02/
>>>>>>>
>>>>>>> Very nice!
>>>>>>
>>>>>> Yes, I agree.  Impressive.
>>>>>>
>>>>>> The refactoring in nmethod.cpp looks good to me.  I have no further
>>>>>> comments.
>>>>>> Thanks!
>>>>>> Coleen
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> @Vladimir: The test shows why we should only clean the ICs but not
>>>>>>>> unload the nmethod if possible. The method ' doWork'
>>>>>>>> is still valid after WorkerClass was unloaded and depending on the
>>>>>>>> complexity of the method we should avoid unloading it.
>>>>>>>
>>>>>>> Make sense.
>>>>>>>
>>>>>>>>
>>>>>>>> On Sparc my patch fixes the bug and leads to the nmethod not being
>>>>>>>> unloaded. The compiled version is therefore used even
>>>>>>>> after WorkerClass is unloaded.
>>>>>>>>
>>>>>>>> On x86 the nmethod is unloaded anyway because of a dead oop. 
>>>>>>>> This is
>>>>>>>> probably due to a slightly different implementation
>>>>>>>> of the ICs. I'll have a closer look to see if we can improve that.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>> On 16.07.2014 10:36, Tobias Hartmann wrote:
>>>>>>>>> Sorry, forgot to answer this question:
>>>>>>>>>> Were you able to create a small test case for it that would be
>>>>>>>>>> useful to add?
>>>>>>>>> Unfortunately I was not able to create a test. The bug only
>>>>>>>>> reproduces on a particular system with a > 30 minute run
>>>>>>>>> of runThese.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Tobias
>>>>>>>>>
>>>>>>>>> On 16.07.2014 09:54, Tobias Hartmann wrote:
>>>>>>>>>> Hi Coleen,
>>>>>>>>>>
>>>>>>>>>> thanks for the review.
>>>>>>>>>>> *+           if (csc->is_call_to_interpreted() &&
>>>>>>>>>>> stub_contains_dead_metadata(is_alive, csc->destination())) {*
>>>>>>>>>>> *+             csc->set_to_clean();*
>>>>>>>>>>> *+           }*
>>>>>>>>>>>
>>>>>>>>>>> This appears in each case.  Can you fold it and the new 
>>>>>>>>>>> function
>>>>>>>>>>> into a function like
>>>>>>>>>>> clean_call_to_interpreted_stub(is_alive, csc)?
>>>>>>>>>>
>>>>>>>>>> I folded it into the function 
>>>>>>>>>> clean_call_to_interpreter_stub(..).
>>>>>>>>>>
>>>>>>>>>> New webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.01/
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Tobias
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Coleen
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So before the permgen removal embedded method* were oops and
>>>>>>>>>>>> they
>>>>>>>>>>>> were processed in relocInfo::oop_type loop.
>>>>>>>>>>>>
>>>>>>>>>>>> May be instead of specializing opt_virtual_call_type and
>>>>>>>>>>>> static_call_type call site you can simple add a loop for
>>>>>>>>>>>> relocInfo::metadata_type (similar to oop_type loop)?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/14/14 4:56 AM, Tobias Hartmann wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> please review the following patch for JDK-8029443.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8029443
>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.00/
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Problem*
>>>>>>>>>>>>> After the tracing/marking phase of GC,
>>>>>>>>>>>>> nmethod::do_unloading(..)
>>>>>>>>>>>>> checks
>>>>>>>>>>>>> if a nmethod can be unloaded because it contains dead 
>>>>>>>>>>>>> oops. If
>>>>>>>>>>>>> class
>>>>>>>>>>>>> unloading occurred we additionally clear all ICs where the
>>>>>>>>>>>>> cached
>>>>>>>>>>>>> metadata refers to an unloaded klass or method. If the 
>>>>>>>>>>>>> nmethod
>>>>>>>>>>>>> is not
>>>>>>>>>>>>> unloaded, nmethod::verify_metadata_loaders(..) finally
>>>>>>>>>>>>> checks if
>>>>>>>>>>>>> all
>>>>>>>>>>>>> metadata is alive. The assert in CheckClass::check_class 
>>>>>>>>>>>>> fails
>>>>>>>>>>>>> because
>>>>>>>>>>>>> the nmethod contains Method* metadata corresponding to a dead
>>>>>>>>>>>>> Klass.
>>>>>>>>>>>>> The Method* belongs to a to-interpreter stub [1] of an
>>>>>>>>>>>>> optimized
>>>>>>>>>>>>> compiled IC. Normally we clear those stubs prior to
>>>>>>>>>>>>> verification to
>>>>>>>>>>>>> avoid dangling references to Method* [2], but only if the 
>>>>>>>>>>>>> stub
>>>>>>>>>>>>> is not in
>>>>>>>>>>>>> use, i.e. if the IC is not in to-interpreted mode. In this
>>>>>>>>>>>>> case the
>>>>>>>>>>>>> to-interpreter stub may be executed and hand a stale Method*
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> interpreter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Solution
>>>>>>>>>>>>> *The implementation of nmethod::do_unloading(..) is 
>>>>>>>>>>>>> changed to
>>>>>>>>>>>>> clean
>>>>>>>>>>>>> compiled ICs and compiled static calls if they call into a
>>>>>>>>>>>>> to-interpreter stub that references dead Method* metadata.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The patch was affected by the G1 class unloading changes
>>>>>>>>>>>>> (JDK-8048248)
>>>>>>>>>>>>> because the method nmethod::do_unloading_parallel(..) was
>>>>>>>>>>>>> added. I
>>>>>>>>>>>>> adapted the implementation as well.
>>>>>>>>>>>>> *
>>>>>>>>>>>>> Testing
>>>>>>>>>>>>> *Failing test (runThese)
>>>>>>>>>>>>> JPRT
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Tobias
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] see CompiledStaticCall::emit_to_interp_stub(..)
>>>>>>>>>>>>> [2] see nmethod::verify_metadata_loaders(..),
>>>>>>>>>>>>> static_stub_reloc()->clear_inline_cache() clears the stub
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>



More information about the hotspot-dev mailing list