RFR(M) JDK-8014555: G1: Memory ordering problem with Conc refinement and card marking

Mikael Gerdin mikael.gerdin at oracle.com
Mon Oct 7 05:25:18 PDT 2013


Vitaly,

On 10/04/2013 03:32 PM, Vitaly Davidovich wrote:
> Mikael,
>
> What if you were to read the card value before the store and then check
> if it's clean and mark it dirty + enqueue? This would be a loadstore
> fence which can be noop in some archs.  Or am I missing something?

The problem is that we need to make sure that the refinement thread 
finds our reference when it cleans our card.
In order to determine that we need to read the card value from memory 
_after_ our store to the java object is known to be visible.
That way we know that the refinement has not started scanning the card 
and therefore won't miss our reference.

If we read the card value before the store, found that it was already 
dirty and skipped enqueuing then we are enlarging the window for when 
the refinement thread may start scanning the card and scan past the 
address where we would store to the java object.

/Mikael

>
> Thanks
>
> Sent from my phone
>
> On Oct 4, 2013 8:18 AM, "Mikael Gerdin" <mikael.gerdin at oracle.com
> <mailto:mikael.gerdin at oracle.com>> wrote:
>
>     David,
>
>     On 10/04/2013 11:15 AM, David Holmes wrote:
>
>         Hi Mikael,
>
>         src/cpu/sparc/vm/c1_Runtime1___sparc.cpp:
>
>         I don't quite follow the logic for this:
>
>                     __ set(rs, cardtable);         // cardtable := <card
>         table base>
>                     __ ldub(addr, cardtable, tmp); // tmp := [addr +
>         cardtable]
>
>         +         __ cmp_and_br_short(tmp,
>         G1SATBCardTableModRefBS::g1___young_card_val(), Assembler::equal,
>         Assembler::pt, young_card);
>         +
>         +         __
>         membar(Assembler::Membar_mask___bits(Assembler::StoreLoad));
>         +         __ ldub(addr, cardtable, tmp); // tmp := [addr +
>         cardtable]
>
>         You load the cardtable with no barrier, checks its value and if
>         it seems
>         "wrong" (?) you then issue the storeload barrier and try again.
>         Why not
>         just insert the storeload barrier between the store and the load?
>
>         Similar comment for src/cpu/sparc/vm/__macroAssembler_sparc.cpp.
>
>         ---
>
>         src/share/vm/gc___implementation/g1/__g1SATBCardTableModRefBS.cpp:
>
>             void
>             G1SATBCardTableLoggingModRefBS__::write_ref_field_work(void*
>         field,
>                                                                  oop
>         new_val) {
>         !   volatile jbyte* byte = byte_for(field);
>         !   if (*byte == g1_young_gen) {
>         !     return;
>         !   }
>         !   OrderAccess::storeload();
>
>         where is the store that you are ordering with?
>
>         Similarly:
>
>         !       for (; byte <= last_byte; byte++) {
>         !         if (*byte == g1_young_gen) {
>         !           continue;
>         !         }
>         !         OrderAccess::storeload();
>                     if (*byte != dirty_card) {
>                       *byte = dirty_card;
>                       jt->dirty_card_queue().__enqueue(byte);
>                     }
>                   }
>
>         the store seems to be after the barrier ?
>
>
>
>     Let me take one step back and describe the situation more thouroghly.
>     The post-barrier for G1 has the following structure before my change
>     (o is a java object, f is a field in o and x is another java object)
>
>     o.f = x
>     if (*card_for(o) == dirty) then goto done
>     *card_for(o) = dirty
>     enqueue(card_for(o))
>     done:
>
>     The memory ordering problem can occur between the store
>     o.f = x and the read of *card_for(o)
>     This is because the concurrent refinement thread will set a card to
>     "clean" before it starts scanning the card for references, but that
>     store is not guaranteed to be visible to the thread executing the
>     post-barrier.
>     So if the CPU re-orders the load from card_for(o) to before the
>     write of o.f the refinement thread may have scanned past the address
>     of o.f but the application thread sees the stale "dirty" value and
>     skips the barrier.
>
>
>     The simple fix for this race is then to insert a storeload between
>     the store and the load:
>     o.f = x
>     #StoreLoad
>     if (*card_for(o) == dirty) then goto done
>     *card_for(o) = dirty
>     enqueue(card_for(o))
>     done:
>
>     Inserting the storeload is too expensive since this code is in the
>     object field write fast-path. So I introduced a way to exclude some
>     writes by relying on a new card value which I know is not suspect to
>     concurrent updates:
>
>     o.f = x
>     if (*card_for(o) == g1_young) then goto done
>     #StoreLoad (because I can only rely on the value "dirty" if the
>     store was committed before the following read)
>     if (*card_for(o) == dirty) then goto done
>     *card_for(o) = dirty
>     enqueue(card_for(o))
>     done:
>
>     I hope this clarifies the issue and the reasoning behind my changes.
>
>     /Mikael
>
>
>         Cheers,
>         David
>
>         On 2/10/2013 10:28 PM, Mikael Gerdin wrote:
>
>             Hi
>
>             Please review my fix for the issue discussed in the "G1
>             question:
>             concurrent cleaning of dirty cards" thread on the
>             hotspot-gc-dev mailing
>             list.
>
>             I'd like someone from the compiler (and runtime? the
>             interpreter uses
>             macroAssembler_*, right?) teams to at least look at the
>             changes to:
>             macroAssembler_*.cpp
>             c1_Runtime_*.cpp
>             graphKit.cpp
>
>             Problem description:
>             G1 has a race where the concurrent refinement thread may
>             miss object
>             references in a dirty card.
>             The problem arises if the CPU re-orders the load of the old
>             card value
>             (which G1 checks to determine if it can skip the barrier)
>             before the store to the actual object.
>             If that occurs the concurrent refinement thread may have set
>             the card to
>             "clean" and proceeded to scan the card but the java thread
>             may have seen
>             the "dirty" value and skipped the post barrier.
>
>             Suggested fix:
>             * Add a memory fence between the store to a java object and
>             the reading
>             of the previous card value.
>             * Modify the code for handling young regions so that all
>             writes to young
>             regions can skip the fence (since it will never be needed
>             for such
>             writes). This introduces a new value in the card table for
>             G1 which
>             indicates a young region.
>
>             Performance impact:
>             * This fix has a negative throughput performance impact of
>             1-1.5%
>             (tested on x86-AMD x86-Intel and SPARC).
>             * We may want to try to get rid of this race at some point by
>             redesigning G1's post barrier but there is not enough time
>             to do that
>             for JDK8.
>
>             Performance numbers for x86 platforms can be seen here:
>             http://cr.openjdk.java.net/~__mgerdin/8014555/perf.txt
>             <http://cr.openjdk.java.net/~mgerdin/8014555/perf.txt>
>
>             Unfortunately the JIRA issue is not externally visible, but
>             the major
>             parts of the discussions about this are present in the
>             mailing list
>             thread. The bug mostly contains my analysis of the crashes
>             which seems
>             to have been caused by this bug.
>
>             Bug link: https://bugs.openjdk.java.net/__browse/JDK-8014555
>             <https://bugs.openjdk.java.net/browse/JDK-8014555>
>
>             Webrev:
>             http://cr.openjdk.java.net/~__mgerdin/8014555/webrev.0
>             <http://cr.openjdk.java.net/~mgerdin/8014555/webrev.0>
>
>



More information about the hotspot-dev mailing list