RFR(M) JDK-8014555: G1: Memory ordering problem with Conc refinement and card marking
Vitaly Davidovich
vitalyd at gmail.com
Mon Oct 7 17:08:24 PDT 2013
Got it, thanks Mikael - sorry for the noise.
Sent from my phone
On Oct 7, 2013 8:25 AM, "Mikael Gerdin" <mikael.gerdin at oracle.com> wrote:
> Vitaly,
>
> On 10/04/2013 03:32 PM, Vitaly Davidovich wrote:
>
>> Mikael,
>>
>> What if you were to read the card value before the store and then check
>> if it's clean and mark it dirty + enqueue? This would be a loadstore
>> fence which can be noop in some archs. Or am I missing something?
>>
>
> The problem is that we need to make sure that the refinement thread finds
> our reference when it cleans our card.
> In order to determine that we need to read the card value from memory
> _after_ our store to the java object is known to be visible.
> That way we know that the refinement has not started scanning the card and
> therefore won't miss our reference.
>
> If we read the card value before the store, found that it was already
> dirty and skipped enqueuing then we are enlarging the window for when the
> refinement thread may start scanning the card and scan past the address
> where we would store to the java object.
>
> /Mikael
>
>
>> Thanks
>>
>> Sent from my phone
>>
>> On Oct 4, 2013 8:18 AM, "Mikael Gerdin" <mikael.gerdin at oracle.com
>> <mailto:mikael.gerdin at oracle.**com <mikael.gerdin at oracle.com>>> wrote:
>>
>> David,
>>
>> On 10/04/2013 11:15 AM, David Holmes wrote:
>>
>> Hi Mikael,
>>
>> src/cpu/sparc/vm/c1_Runtime1__**_sparc.cpp:
>>
>> I don't quite follow the logic for this:
>>
>> __ set(rs, cardtable); // cardtable := <card
>> table base>
>> __ ldub(addr, cardtable, tmp); // tmp := [addr +
>> cardtable]
>>
>> + __ cmp_and_br_short(tmp,
>> G1SATBCardTableModRefBS::g1___**young_card_val(),
>> Assembler::equal,
>> Assembler::pt, young_card);
>> +
>> + __
>> membar(Assembler::Membar_mask_**__bits(Assembler::StoreLoad));
>> + __ ldub(addr, cardtable, tmp); // tmp := [addr +
>> cardtable]
>>
>> You load the cardtable with no barrier, checks its value and if
>> it seems
>> "wrong" (?) you then issue the storeload barrier and try again.
>> Why not
>> just insert the storeload barrier between the store and the load?
>>
>> Similar comment for src/cpu/sparc/vm/__**
>> macroAssembler_sparc.cpp.
>>
>> ---
>>
>> src/share/vm/gc___**implementation/g1/__**
>> g1SATBCardTableModRefBS.cpp:
>>
>> void
>> G1SATBCardTableLoggingModRefBS**
>> __::write_ref_field_work(void*
>> field,
>> oop
>> new_val) {
>> ! volatile jbyte* byte = byte_for(field);
>> ! if (*byte == g1_young_gen) {
>> ! return;
>> ! }
>> ! OrderAccess::storeload();
>>
>> where is the store that you are ordering with?
>>
>> Similarly:
>>
>> ! for (; byte <= last_byte; byte++) {
>> ! if (*byte == g1_young_gen) {
>> ! continue;
>> ! }
>> ! OrderAccess::storeload();
>> if (*byte != dirty_card) {
>> *byte = dirty_card;
>> jt->dirty_card_queue().__**enqueue(byte);
>> }
>> }
>>
>> the store seems to be after the barrier ?
>>
>>
>>
>> Let me take one step back and describe the situation more thouroghly.
>> The post-barrier for G1 has the following structure before my change
>> (o is a java object, f is a field in o and x is another java object)
>>
>> o.f = x
>> if (*card_for(o) == dirty) then goto done
>> *card_for(o) = dirty
>> enqueue(card_for(o))
>> done:
>>
>> The memory ordering problem can occur between the store
>> o.f = x and the read of *card_for(o)
>> This is because the concurrent refinement thread will set a card to
>> "clean" before it starts scanning the card for references, but that
>> store is not guaranteed to be visible to the thread executing the
>> post-barrier.
>> So if the CPU re-orders the load from card_for(o) to before the
>> write of o.f the refinement thread may have scanned past the address
>> of o.f but the application thread sees the stale "dirty" value and
>> skips the barrier.
>>
>>
>> The simple fix for this race is then to insert a storeload between
>> the store and the load:
>> o.f = x
>> #StoreLoad
>> if (*card_for(o) == dirty) then goto done
>> *card_for(o) = dirty
>> enqueue(card_for(o))
>> done:
>>
>> Inserting the storeload is too expensive since this code is in the
>> object field write fast-path. So I introduced a way to exclude some
>> writes by relying on a new card value which I know is not suspect to
>> concurrent updates:
>>
>> o.f = x
>> if (*card_for(o) == g1_young) then goto done
>> #StoreLoad (because I can only rely on the value "dirty" if the
>> store was committed before the following read)
>> if (*card_for(o) == dirty) then goto done
>> *card_for(o) = dirty
>> enqueue(card_for(o))
>> done:
>>
>> I hope this clarifies the issue and the reasoning behind my changes.
>>
>> /Mikael
>>
>>
>> Cheers,
>> David
>>
>> On 2/10/2013 10:28 PM, Mikael Gerdin wrote:
>>
>> Hi
>>
>> Please review my fix for the issue discussed in the "G1
>> question:
>> concurrent cleaning of dirty cards" thread on the
>> hotspot-gc-dev mailing
>> list.
>>
>> I'd like someone from the compiler (and runtime? the
>> interpreter uses
>> macroAssembler_*, right?) teams to at least look at the
>> changes to:
>> macroAssembler_*.cpp
>> c1_Runtime_*.cpp
>> graphKit.cpp
>>
>> Problem description:
>> G1 has a race where the concurrent refinement thread may
>> miss object
>> references in a dirty card.
>> The problem arises if the CPU re-orders the load of the old
>> card value
>> (which G1 checks to determine if it can skip the barrier)
>> before the store to the actual object.
>> If that occurs the concurrent refinement thread may have set
>> the card to
>> "clean" and proceeded to scan the card but the java thread
>> may have seen
>> the "dirty" value and skipped the post barrier.
>>
>> Suggested fix:
>> * Add a memory fence between the store to a java object and
>> the reading
>> of the previous card value.
>> * Modify the code for handling young regions so that all
>> writes to young
>> regions can skip the fence (since it will never be needed
>> for such
>> writes). This introduces a new value in the card table for
>> G1 which
>> indicates a young region.
>>
>> Performance impact:
>> * This fix has a negative throughput performance impact of
>> 1-1.5%
>> (tested on x86-AMD x86-Intel and SPARC).
>> * We may want to try to get rid of this race at some point by
>> redesigning G1's post barrier but there is not enough time
>> to do that
>> for JDK8.
>>
>> Performance numbers for x86 platforms can be seen here:
>> http://cr.openjdk.java.net/~__**mgerdin/8014555/perf.txt<http://cr.openjdk.java.net/~__mgerdin/8014555/perf.txt>
>> <http://cr.openjdk.java.net/~**mgerdin/8014555/perf.txt<http://cr.openjdk.java.net/~mgerdin/8014555/perf.txt>
>> >
>>
>> Unfortunately the JIRA issue is not externally visible, but
>> the major
>> parts of the discussions about this are present in the
>> mailing list
>> thread. The bug mostly contains my analysis of the crashes
>> which seems
>> to have been caused by this bug.
>>
>> Bug link: https://bugs.openjdk.java.net/**
>> __browse/JDK-8014555 <https://bugs.openjdk.java.net/__browse/JDK-8014555>
>> <https://bugs.openjdk.java.**net/browse/JDK-8014555<https://bugs.openjdk.java.net/browse/JDK-8014555>
>> >
>>
>> Webrev:
>> http://cr.openjdk.java.net/~__**mgerdin/8014555/webrev.0<http://cr.openjdk.java.net/~__mgerdin/8014555/webrev.0>
>> <http://cr.openjdk.java.net/~**mgerdin/8014555/webrev.0<http://cr.openjdk.java.net/~mgerdin/8014555/webrev.0>
>> >
>>
>>
>>
>
More information about the hotspot-dev
mailing list