RFR(M) JDK-8014555: G1: Memory ordering problem with Conc refinement and card marking
Vitaly Davidovich
vitalyd at gmail.com
Fri Oct 4 06:32:22 PDT 2013
Mikael,
What if you were to read the card value before the store and then check if
it's clean and mark it dirty + enqueue? This would be a loadstore fence
which can be noop in some archs. Or am I missing something?
Thanks
Sent from my phone
On Oct 4, 2013 8:18 AM, "Mikael Gerdin" <mikael.gerdin at oracle.com> wrote:
> David,
>
> On 10/04/2013 11:15 AM, David Holmes wrote:
>
>> Hi Mikael,
>>
>> src/cpu/sparc/vm/c1_Runtime1_**sparc.cpp:
>>
>> I don't quite follow the logic for this:
>>
>> __ set(rs, cardtable); // cardtable := <card table
>> base>
>> __ ldub(addr, cardtable, tmp); // tmp := [addr + cardtable]
>>
>> + __ cmp_and_br_short(tmp,
>> G1SATBCardTableModRefBS::g1_**young_card_val(), Assembler::equal,
>> Assembler::pt, young_card);
>> +
>> + __ membar(Assembler::Membar_mask_**bits(Assembler::StoreLoad));
>> + __ ldub(addr, cardtable, tmp); // tmp := [addr + cardtable]
>>
>> You load the cardtable with no barrier, checks its value and if it seems
>> "wrong" (?) you then issue the storeload barrier and try again. Why not
>> just insert the storeload barrier between the store and the load?
>>
>> Similar comment for src/cpu/sparc/vm/**macroAssembler_sparc.cpp.
>>
>> ---
>>
>> src/share/vm/gc_**implementation/g1/**g1SATBCardTableModRefBS.cpp:
>>
>> void
>> G1SATBCardTableLoggingModRefBS**::write_ref_field_work(void* field,
>> oop new_val) {
>> ! volatile jbyte* byte = byte_for(field);
>> ! if (*byte == g1_young_gen) {
>> ! return;
>> ! }
>> ! OrderAccess::storeload();
>>
>> where is the store that you are ordering with?
>>
>> Similarly:
>>
>> ! for (; byte <= last_byte; byte++) {
>> ! if (*byte == g1_young_gen) {
>> ! continue;
>> ! }
>> ! OrderAccess::storeload();
>> if (*byte != dirty_card) {
>> *byte = dirty_card;
>> jt->dirty_card_queue().**enqueue(byte);
>> }
>> }
>>
>> the store seems to be after the barrier ?
>>
>
>
> Let me take one step back and describe the situation more thouroghly.
> The post-barrier for G1 has the following structure before my change
> (o is a java object, f is a field in o and x is another java object)
>
> o.f = x
> if (*card_for(o) == dirty) then goto done
> *card_for(o) = dirty
> enqueue(card_for(o))
> done:
>
> The memory ordering problem can occur between the store
> o.f = x and the read of *card_for(o)
> This is because the concurrent refinement thread will set a card to
> "clean" before it starts scanning the card for references, but that store
> is not guaranteed to be visible to the thread executing the post-barrier.
> So if the CPU re-orders the load from card_for(o) to before the write of
> o.f the refinement thread may have scanned past the address of o.f but the
> application thread sees the stale "dirty" value and skips the barrier.
>
>
> The simple fix for this race is then to insert a storeload between the
> store and the load:
> o.f = x
> #StoreLoad
> if (*card_for(o) == dirty) then goto done
> *card_for(o) = dirty
> enqueue(card_for(o))
> done:
>
> Inserting the storeload is too expensive since this code is in the object
> field write fast-path. So I introduced a way to exclude some writes by
> relying on a new card value which I know is not suspect to concurrent
> updates:
>
> o.f = x
> if (*card_for(o) == g1_young) then goto done
> #StoreLoad (because I can only rely on the value "dirty" if the store was
> committed before the following read)
> if (*card_for(o) == dirty) then goto done
> *card_for(o) = dirty
> enqueue(card_for(o))
> done:
>
> I hope this clarifies the issue and the reasoning behind my changes.
>
> /Mikael
>
>
>> Cheers,
>> David
>>
>> On 2/10/2013 10:28 PM, Mikael Gerdin wrote:
>>
>>> Hi
>>>
>>> Please review my fix for the issue discussed in the "G1 question:
>>> concurrent cleaning of dirty cards" thread on the hotspot-gc-dev mailing
>>> list.
>>>
>>> I'd like someone from the compiler (and runtime? the interpreter uses
>>> macroAssembler_*, right?) teams to at least look at the changes to:
>>> macroAssembler_*.cpp
>>> c1_Runtime_*.cpp
>>> graphKit.cpp
>>>
>>> Problem description:
>>> G1 has a race where the concurrent refinement thread may miss object
>>> references in a dirty card.
>>> The problem arises if the CPU re-orders the load of the old card value
>>> (which G1 checks to determine if it can skip the barrier)
>>> before the store to the actual object.
>>> If that occurs the concurrent refinement thread may have set the card to
>>> "clean" and proceeded to scan the card but the java thread may have seen
>>> the "dirty" value and skipped the post barrier.
>>>
>>> Suggested fix:
>>> * Add a memory fence between the store to a java object and the reading
>>> of the previous card value.
>>> * Modify the code for handling young regions so that all writes to young
>>> regions can skip the fence (since it will never be needed for such
>>> writes). This introduces a new value in the card table for G1 which
>>> indicates a young region.
>>>
>>> Performance impact:
>>> * This fix has a negative throughput performance impact of 1-1.5%
>>> (tested on x86-AMD x86-Intel and SPARC).
>>> * We may want to try to get rid of this race at some point by
>>> redesigning G1's post barrier but there is not enough time to do that
>>> for JDK8.
>>>
>>> Performance numbers for x86 platforms can be seen here:
>>> http://cr.openjdk.java.net/~**mgerdin/8014555/perf.txt<http://cr.openjdk.java.net/~mgerdin/8014555/perf.txt>
>>>
>>> Unfortunately the JIRA issue is not externally visible, but the major
>>> parts of the discussions about this are present in the mailing list
>>> thread. The bug mostly contains my analysis of the crashes which seems
>>> to have been caused by this bug.
>>>
>>> Bug link: https://bugs.openjdk.java.net/**browse/JDK-8014555<https://bugs.openjdk.java.net/browse/JDK-8014555>
>>>
>>> Webrev: http://cr.openjdk.java.net/~**mgerdin/8014555/webrev.0<http://cr.openjdk.java.net/~mgerdin/8014555/webrev.0>
>>>
>>>
>
More information about the hotspot-dev
mailing list