[9] RFR(S): 8029443: 'assert(klass->is_loader_alive(_is_alive)) failed: must be alive' during VM_CollectForMetadataAllocation

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Jul 22 18:04:15 UTC 2014


I agree with Mikael, the case without compressed oops is incorrect.
The problem is immP_load() and immP_no_oop_cheap() operands miss the 
check for metadata pointer, they only check for oop. For class loading 
loadConP_load() should be used instead of loadConP_no_oop_cheap() and it 
should check relocation type and do what loadConP_set() does.


X64 seems fine. X86_64.ad use $$$emit32$src$$constant; in such case 
(load_immP31) which is expanded to the same code as load_immP:

if ( opnd_array(1)->constant_reloc() != relocInfo::none ) {
   emit_d32_reloc(cbuf, opnd_array(1)->constant(), 
opnd_array(1)->constant_reloc(), 0);

Thanks,
Vladimir

On 7/22/14 2:22 AM, Tobias Hartmann wrote:
> On 21.07.2014 20:59, Vladimir Kozlov wrote:
>> On 7/21/14 1:44 AM, Tobias Hartmann wrote:
>>> Vladimir, Coleen, thanks for the reviews!
>>>
>>> On 18.07.2014 20:09, Vladimir Kozlov wrote:
>>>> On 7/18/14 11:02 AM, Coleen Phillimore wrote:
>>>>>
>>>>> On 7/18/14, 11:06 AM, Vladimir Kozlov wrote:
>>>>>> On 7/18/14 4:38 AM, Tobias Hartmann wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I spend some more days and was finally able to implement a test that
>>>>>>> deterministically triggers the bug:
>>>>>>
>>>>>> Why do you need to switch off compressed oops? Do you need to switch
>>>>>> off compressed klass pointers too (UseCompressedClassPointers)?
>>>>>
>>>>> CompressedOops when off turns off CompressedClassPointers.
>>>>
>>>> You are right, I forgot that. Still the question is why switch off
>>>> coop?
>>>
>>> I'm only able to reproduce the bug without compressed oops. The original
>>> bug also only reproduces with -XX:-UseCompressedOops. I tried to figure
>>> out why (on Sparc):
>>>
>>> With compressed oops enabled, Method* metadata referencing 'WorkerClass'
>>> is added to 'doWork' in MacroAssembler::set_narrow_klass(..). In
>>> CodeBuffer::finalize_oop_references(..) the metadata is processed and an
>>> oop to the class loader 'URLClassLoader' is added. This oop leads to the
>>> unloading of 'doWork', hence the verification code is never executed.
>>>
>>> I'm not sure what set_narrow_klass(..) is used for in this case. I
>>> assume it stores a 'WorkerClass' Klass* in a register as part of an
>>> optimization? Because 'doWork' potentially works on any class.
>>> Apparently this optimization is not performed without compressed oops.
>>
>> I would suggest to compare 'doWork' assembler
>> (-XX:CompileCommand=print,TestMethodUnloading::doWork) with coop and
>> without it. Usually loaded into register class is used for klass
>> compare do guard inlining code. Or to initialize new object.
>>
>> I don't see loading (constructing) uncompressed (whole) klass pointer
>> from constant in sparc.ad. It could be the reason for different
>> behavior. It could be loaded from constants section. But constants
>> section should have metadata relocation info in such case.
>
> I did as you suggested and found the following:
>
> During the profiling phase the class given to 'doWork' always is
> 'WorkerClass'. The C2 compiler therefore optimizes the compiled version
> to expect a 'WorkerClass'. The branch that instantiates a new object is
> guarded by an uncommon trap (class_check). The difference between the
> two versions (with and without compressed oops) is the loading of the
> 'WorkerClass' Klass to check if the given class is equal:
>
> With compressed oops:
>     SET    narrowklass: precise klass WorkerClass:
> 0x00000001004a0d40:Constant:exact *,R_L1 ! compressed klass ptr
>     CWBne  R_L2,R_L1,B8      ! compressed ptr  P=0.000001 C=-1.000000
>
> Without:
>     SET    precise klass WorkerClass: 0x00000001004aeab0:Constant:exact
> *,R_L1      ! non-oop ptr
>     CXBpne R_L2,R_L1,B8     ! ptr  P=0.000001 C=-1.000000
>
> R_L2: class given as parameter
> B8: location of uncommon trap
>
> In the first case, the Klass is loaded by a 'loadConNKlass' instruction
> that calls MacroAssembler::set_narrow_klass(..) which then creates a
> metadata_Relocation for the 'WorkerClass'. This metada_Relocation is
> processed by CodeBuffer::finalize_oop_references(..) and an oop to
> 'WorkerClass' is added. This oop causes the unloading of the method.
>
> In the second case, the Klass is loaded by a 'loadConP_no_oop_cheap'
> instruction that does not create a metadata_Relocation.
>
> I don't understand why the metadata_Relocation in the first case is
> needed? As the test shows it is better to only unload the method if we
> hit the uncommon trap because we could still use other (potentially
> complex) branches of the method.
>
> Thanks,
> Tobias
>
>>
>> thanks,
>> Vladimir
>>
>>>
>>> Best,
>>> Tobias
>>>
>>>>
>>>> Vladimir
>>>>
>>>>>>
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.02/
>>>>>>
>>>>>> Very nice!
>>>>>
>>>>> Yes, I agree.  Impressive.
>>>>>
>>>>> The refactoring in nmethod.cpp looks good to me.  I have no further
>>>>> comments.
>>>>> Thanks!
>>>>> Coleen
>>>>>
>>>>>>
>>>>>>>
>>>>>>> @Vladimir: The test shows why we should only clean the ICs but not
>>>>>>> unload the nmethod if possible. The method ' doWork'
>>>>>>> is still valid after WorkerClass was unloaded and depending on the
>>>>>>> complexity of the method we should avoid unloading it.
>>>>>>
>>>>>> Make sense.
>>>>>>
>>>>>>>
>>>>>>> On Sparc my patch fixes the bug and leads to the nmethod not being
>>>>>>> unloaded. The compiled version is therefore used even
>>>>>>> after WorkerClass is unloaded.
>>>>>>>
>>>>>>> On x86 the nmethod is unloaded anyway because of a dead oop. This is
>>>>>>> probably due to a slightly different implementation
>>>>>>> of the ICs. I'll have a closer look to see if we can improve that.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>
>>>>>>> On 16.07.2014 10:36, Tobias Hartmann wrote:
>>>>>>>> Sorry, forgot to answer this question:
>>>>>>>>> Were you able to create a small test case for it that would be
>>>>>>>>> useful to add?
>>>>>>>> Unfortunately I was not able to create a test. The bug only
>>>>>>>> reproduces on a particular system with a > 30 minute run
>>>>>>>> of runThese.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>> On 16.07.2014 09:54, Tobias Hartmann wrote:
>>>>>>>>> Hi Coleen,
>>>>>>>>>
>>>>>>>>> thanks for the review.
>>>>>>>>>> *+           if (csc->is_call_to_interpreted() &&
>>>>>>>>>> stub_contains_dead_metadata(is_alive, csc->destination())) {*
>>>>>>>>>> *+             csc->set_to_clean();*
>>>>>>>>>> *+           }*
>>>>>>>>>>
>>>>>>>>>> This appears in each case.  Can you fold it and the new function
>>>>>>>>>> into a function like
>>>>>>>>>> clean_call_to_interpreted_stub(is_alive, csc)?
>>>>>>>>>
>>>>>>>>> I folded it into the function clean_call_to_interpreter_stub(..).
>>>>>>>>>
>>>>>>>>> New webrev:
>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.01/
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Tobias
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So before the permgen removal embedded method* were oops and
>>>>>>>>>>> they
>>>>>>>>>>> were processed in relocInfo::oop_type loop.
>>>>>>>>>>>
>>>>>>>>>>> May be instead of specializing opt_virtual_call_type and
>>>>>>>>>>> static_call_type call site you can simple add a loop for
>>>>>>>>>>> relocInfo::metadata_type (similar to oop_type loop)?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 7/14/14 4:56 AM, Tobias Hartmann wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> please review the following patch for JDK-8029443.
>>>>>>>>>>>>
>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8029443
>>>>>>>>>>>> Webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8029443/webrev.00/
>>>>>>>>>>>>
>>>>>>>>>>>> *Problem*
>>>>>>>>>>>> After the tracing/marking phase of GC,
>>>>>>>>>>>> nmethod::do_unloading(..)
>>>>>>>>>>>> checks
>>>>>>>>>>>> if a nmethod can be unloaded because it contains dead oops. If
>>>>>>>>>>>> class
>>>>>>>>>>>> unloading occurred we additionally clear all ICs where the
>>>>>>>>>>>> cached
>>>>>>>>>>>> metadata refers to an unloaded klass or method. If the nmethod
>>>>>>>>>>>> is not
>>>>>>>>>>>> unloaded, nmethod::verify_metadata_loaders(..) finally
>>>>>>>>>>>> checks if
>>>>>>>>>>>> all
>>>>>>>>>>>> metadata is alive. The assert in CheckClass::check_class fails
>>>>>>>>>>>> because
>>>>>>>>>>>> the nmethod contains Method* metadata corresponding to a dead
>>>>>>>>>>>> Klass.
>>>>>>>>>>>> The Method* belongs to a to-interpreter stub [1] of an
>>>>>>>>>>>> optimized
>>>>>>>>>>>> compiled IC. Normally we clear those stubs prior to
>>>>>>>>>>>> verification to
>>>>>>>>>>>> avoid dangling references to Method* [2], but only if the stub
>>>>>>>>>>>> is not in
>>>>>>>>>>>> use, i.e. if the IC is not in to-interpreted mode. In this
>>>>>>>>>>>> case the
>>>>>>>>>>>> to-interpreter stub may be executed and hand a stale Method*
>>>>>>>>>>>> to the
>>>>>>>>>>>> interpreter.
>>>>>>>>>>>>
>>>>>>>>>>>> *Solution
>>>>>>>>>>>> *The implementation of nmethod::do_unloading(..) is changed to
>>>>>>>>>>>> clean
>>>>>>>>>>>> compiled ICs and compiled static calls if they call into a
>>>>>>>>>>>> to-interpreter stub that references dead Method* metadata.
>>>>>>>>>>>>
>>>>>>>>>>>> The patch was affected by the G1 class unloading changes
>>>>>>>>>>>> (JDK-8048248)
>>>>>>>>>>>> because the method nmethod::do_unloading_parallel(..) was
>>>>>>>>>>>> added. I
>>>>>>>>>>>> adapted the implementation as well.
>>>>>>>>>>>> *
>>>>>>>>>>>> Testing
>>>>>>>>>>>> *Failing test (runThese)
>>>>>>>>>>>> JPRT
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Tobias
>>>>>>>>>>>>
>>>>>>>>>>>> [1] see CompiledStaticCall::emit_to_interp_stub(..)
>>>>>>>>>>>> [2] see nmethod::verify_metadata_loaders(..),
>>>>>>>>>>>> static_stub_reloc()->clear_inline_cache() clears the stub
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>


More information about the hotspot-dev mailing list