RFR(M) JDK-8014555: G1: Memory ordering problem with Conc refinement and card marking

David Holmes david.holmes at oracle.com
Fri Oct 4 05:32:59 PDT 2013


Hi Mikael,

Thanks for the explanation. See inline ...

On 4/10/2013 10:17 PM, Mikael Gerdin wrote:
> David,
>
> On 10/04/2013 11:15 AM, David Holmes wrote:
>> Hi Mikael,
>>
>> src/cpu/sparc/vm/c1_Runtime1_sparc.cpp:
>>
>> I don't quite follow the logic for this:
>>
>>            __ set(rs, cardtable);         // cardtable := <card table
>> base>
>>            __ ldub(addr, cardtable, tmp); // tmp := [addr + cardtable]
>>
>> +         __ cmp_and_br_short(tmp,
>> G1SATBCardTableModRefBS::g1_young_card_val(), Assembler::equal,
>> Assembler::pt, young_card);
>> +
>> +         __ membar(Assembler::Membar_mask_bits(Assembler::StoreLoad));
>> +         __ ldub(addr, cardtable, tmp); // tmp := [addr + cardtable]
>>
>> You load the cardtable with no barrier, checks its value and if it seems
>> "wrong" (?) you then issue the storeload barrier and try again. Why not
>> just insert the storeload barrier between the store and the load?
>>
>> Similar comment for src/cpu/sparc/vm/macroAssembler_sparc.cpp.
>>

I can see how what you explained fits in here but ...

>> ---
>>
>> src/share/vm/gc_implementation/g1/g1SATBCardTableModRefBS.cpp:
>>
>>    void
>>    G1SATBCardTableLoggingModRefBS::write_ref_field_work(void* field,
>>                                                         oop new_val) {
>> !   volatile jbyte* byte = byte_for(field);
>> !   if (*byte == g1_young_gen) {
>> !     return;
>> !   }
>> !   OrderAccess::storeload();
>>
>> where is the store that you are ordering with?

not here - where is the store?

>> Similarly:
>>
>> !       for (; byte <= last_byte; byte++) {
>> !         if (*byte == g1_young_gen) {
>> !           continue;
>> !         }
>> !         OrderAccess::storeload();
>>            if (*byte != dirty_card) {
>>              *byte = dirty_card;
>>              jt->dirty_card_queue().enqueue(byte);
>>            }
>>          }
>>
>> the store seems to be after the barrier ?

Nor here - which store and load are we ordering?

Thanks,
David
-----

>
> Let me take one step back and describe the situation more thouroghly.
> The post-barrier for G1 has the following structure before my change
> (o is a java object, f is a field in o and x is another java object)
>
> o.f = x
> if (*card_for(o) == dirty) then goto done
> *card_for(o) = dirty
> enqueue(card_for(o))
> done:
>
> The memory ordering problem can occur between the store
> o.f = x and the read of *card_for(o)
> This is because the concurrent refinement thread will set a card to
> "clean" before it starts scanning the card for references, but that
> store is not guaranteed to be visible to the thread executing the
> post-barrier.
> So if the CPU re-orders the load from card_for(o) to before the write of
> o.f the refinement thread may have scanned past the address of o.f but
> the application thread sees the stale "dirty" value and skips the barrier.
>
>
> The simple fix for this race is then to insert a storeload between the
> store and the load:
> o.f = x
> #StoreLoad
> if (*card_for(o) == dirty) then goto done
> *card_for(o) = dirty
> enqueue(card_for(o))
> done:
>
> Inserting the storeload is too expensive since this code is in the
> object field write fast-path. So I introduced a way to exclude some
> writes by relying on a new card value which I know is not suspect to
> concurrent updates:
>
> o.f = x
> if (*card_for(o) == g1_young) then goto done
> #StoreLoad (because I can only rely on the value "dirty" if the store
> was committed before the following read)
> if (*card_for(o) == dirty) then goto done
> *card_for(o) = dirty
> enqueue(card_for(o))
> done:
>
> I hope this clarifies the issue and the reasoning behind my changes.
>
> /Mikael
>
>>
>> Cheers,
>> David
>>
>> On 2/10/2013 10:28 PM, Mikael Gerdin wrote:
>>> Hi
>>>
>>> Please review my fix for the issue discussed in the "G1 question:
>>> concurrent cleaning of dirty cards" thread on the hotspot-gc-dev mailing
>>> list.
>>>
>>> I'd like someone from the compiler (and runtime? the interpreter uses
>>> macroAssembler_*, right?) teams to at least look at the changes to:
>>> macroAssembler_*.cpp
>>> c1_Runtime_*.cpp
>>> graphKit.cpp
>>>
>>> Problem description:
>>> G1 has a race where the concurrent refinement thread may miss object
>>> references in a dirty card.
>>> The problem arises if the CPU re-orders the load of the old card value
>>> (which G1 checks to determine if it can skip the barrier)
>>> before the store to the actual object.
>>> If that occurs the concurrent refinement thread may have set the card to
>>> "clean" and proceeded to scan the card but the java thread may have seen
>>> the "dirty" value and skipped the post barrier.
>>>
>>> Suggested fix:
>>> * Add a memory fence between the store to a java object and the reading
>>> of the previous card value.
>>> * Modify the code for handling young regions so that all writes to young
>>> regions can skip the fence (since it will never be needed for such
>>> writes). This introduces a new value in the card table for G1 which
>>> indicates a young region.
>>>
>>> Performance impact:
>>> * This fix has a negative throughput performance impact of 1-1.5%
>>> (tested on x86-AMD x86-Intel and SPARC).
>>> * We may want to try to get rid of this race at some point by
>>> redesigning G1's post barrier but there is not enough time to do that
>>> for JDK8.
>>>
>>> Performance numbers for x86 platforms can be seen here:
>>> http://cr.openjdk.java.net/~mgerdin/8014555/perf.txt
>>>
>>> Unfortunately the JIRA issue is not externally visible, but the major
>>> parts of the discussions about this are present in the mailing list
>>> thread. The bug mostly contains my analysis of the crashes which seems
>>> to have been caused by this bug.
>>>
>>> Bug link: https://bugs.openjdk.java.net/browse/JDK-8014555
>>>
>>> Webrev: http://cr.openjdk.java.net/~mgerdin/8014555/webrev.0
>>>
>


More information about the hotspot-dev mailing list