RFR(M) JDK-8014555: G1: Memory ordering problem with Conc refinement and card marking

Vitaly Davidovich vitalyd at gmail.com
Mon Oct 7 17:08:24 PDT 2013


Got it, thanks Mikael - sorry for the noise.

Sent from my phone
On Oct 7, 2013 8:25 AM, "Mikael Gerdin" <mikael.gerdin at oracle.com> wrote:

> Vitaly,
>
> On 10/04/2013 03:32 PM, Vitaly Davidovich wrote:
>
>> Mikael,
>>
>> What if you were to read the card value before the store and then check
>> if it's clean and mark it dirty + enqueue? This would be a loadstore
>> fence which can be noop in some archs.  Or am I missing something?
>>
>
> The problem is that we need to make sure that the refinement thread finds
> our reference when it cleans our card.
> In order to determine that we need to read the card value from memory
> _after_ our store to the java object is known to be visible.
> That way we know that the refinement has not started scanning the card and
> therefore won't miss our reference.
>
> If we read the card value before the store, found that it was already
> dirty and skipped enqueuing then we are enlarging the window for when the
> refinement thread may start scanning the card and scan past the address
> where we would store to the java object.
>
> /Mikael
>
>
>> Thanks
>>
>> Sent from my phone
>>
>> On Oct 4, 2013 8:18 AM, "Mikael Gerdin" <mikael.gerdin at oracle.com
>> <mailto:mikael.gerdin at oracle.**com <mikael.gerdin at oracle.com>>> wrote:
>>
>>     David,
>>
>>     On 10/04/2013 11:15 AM, David Holmes wrote:
>>
>>         Hi Mikael,
>>
>>         src/cpu/sparc/vm/c1_Runtime1__**_sparc.cpp:
>>
>>         I don't quite follow the logic for this:
>>
>>                     __ set(rs, cardtable);         // cardtable := <card
>>         table base>
>>                     __ ldub(addr, cardtable, tmp); // tmp := [addr +
>>         cardtable]
>>
>>         +         __ cmp_and_br_short(tmp,
>>         G1SATBCardTableModRefBS::g1___**young_card_val(),
>> Assembler::equal,
>>         Assembler::pt, young_card);
>>         +
>>         +         __
>>         membar(Assembler::Membar_mask_**__bits(Assembler::StoreLoad));
>>         +         __ ldub(addr, cardtable, tmp); // tmp := [addr +
>>         cardtable]
>>
>>         You load the cardtable with no barrier, checks its value and if
>>         it seems
>>         "wrong" (?) you then issue the storeload barrier and try again.
>>         Why not
>>         just insert the storeload barrier between the store and the load?
>>
>>         Similar comment for src/cpu/sparc/vm/__**
>> macroAssembler_sparc.cpp.
>>
>>         ---
>>
>>         src/share/vm/gc___**implementation/g1/__**
>> g1SATBCardTableModRefBS.cpp:
>>
>>             void
>>             G1SATBCardTableLoggingModRefBS**
>> __::write_ref_field_work(void*
>>         field,
>>                                                                  oop
>>         new_val) {
>>         !   volatile jbyte* byte = byte_for(field);
>>         !   if (*byte == g1_young_gen) {
>>         !     return;
>>         !   }
>>         !   OrderAccess::storeload();
>>
>>         where is the store that you are ordering with?
>>
>>         Similarly:
>>
>>         !       for (; byte <= last_byte; byte++) {
>>         !         if (*byte == g1_young_gen) {
>>         !           continue;
>>         !         }
>>         !         OrderAccess::storeload();
>>                     if (*byte != dirty_card) {
>>                       *byte = dirty_card;
>>                       jt->dirty_card_queue().__**enqueue(byte);
>>                     }
>>                   }
>>
>>         the store seems to be after the barrier ?
>>
>>
>>
>>     Let me take one step back and describe the situation more thouroghly.
>>     The post-barrier for G1 has the following structure before my change
>>     (o is a java object, f is a field in o and x is another java object)
>>
>>     o.f = x
>>     if (*card_for(o) == dirty) then goto done
>>     *card_for(o) = dirty
>>     enqueue(card_for(o))
>>     done:
>>
>>     The memory ordering problem can occur between the store
>>     o.f = x and the read of *card_for(o)
>>     This is because the concurrent refinement thread will set a card to
>>     "clean" before it starts scanning the card for references, but that
>>     store is not guaranteed to be visible to the thread executing the
>>     post-barrier.
>>     So if the CPU re-orders the load from card_for(o) to before the
>>     write of o.f the refinement thread may have scanned past the address
>>     of o.f but the application thread sees the stale "dirty" value and
>>     skips the barrier.
>>
>>
>>     The simple fix for this race is then to insert a storeload between
>>     the store and the load:
>>     o.f = x
>>     #StoreLoad
>>     if (*card_for(o) == dirty) then goto done
>>     *card_for(o) = dirty
>>     enqueue(card_for(o))
>>     done:
>>
>>     Inserting the storeload is too expensive since this code is in the
>>     object field write fast-path. So I introduced a way to exclude some
>>     writes by relying on a new card value which I know is not suspect to
>>     concurrent updates:
>>
>>     o.f = x
>>     if (*card_for(o) == g1_young) then goto done
>>     #StoreLoad (because I can only rely on the value "dirty" if the
>>     store was committed before the following read)
>>     if (*card_for(o) == dirty) then goto done
>>     *card_for(o) = dirty
>>     enqueue(card_for(o))
>>     done:
>>
>>     I hope this clarifies the issue and the reasoning behind my changes.
>>
>>     /Mikael
>>
>>
>>         Cheers,
>>         David
>>
>>         On 2/10/2013 10:28 PM, Mikael Gerdin wrote:
>>
>>             Hi
>>
>>             Please review my fix for the issue discussed in the "G1
>>             question:
>>             concurrent cleaning of dirty cards" thread on the
>>             hotspot-gc-dev mailing
>>             list.
>>
>>             I'd like someone from the compiler (and runtime? the
>>             interpreter uses
>>             macroAssembler_*, right?) teams to at least look at the
>>             changes to:
>>             macroAssembler_*.cpp
>>             c1_Runtime_*.cpp
>>             graphKit.cpp
>>
>>             Problem description:
>>             G1 has a race where the concurrent refinement thread may
>>             miss object
>>             references in a dirty card.
>>             The problem arises if the CPU re-orders the load of the old
>>             card value
>>             (which G1 checks to determine if it can skip the barrier)
>>             before the store to the actual object.
>>             If that occurs the concurrent refinement thread may have set
>>             the card to
>>             "clean" and proceeded to scan the card but the java thread
>>             may have seen
>>             the "dirty" value and skipped the post barrier.
>>
>>             Suggested fix:
>>             * Add a memory fence between the store to a java object and
>>             the reading
>>             of the previous card value.
>>             * Modify the code for handling young regions so that all
>>             writes to young
>>             regions can skip the fence (since it will never be needed
>>             for such
>>             writes). This introduces a new value in the card table for
>>             G1 which
>>             indicates a young region.
>>
>>             Performance impact:
>>             * This fix has a negative throughput performance impact of
>>             1-1.5%
>>             (tested on x86-AMD x86-Intel and SPARC).
>>             * We may want to try to get rid of this race at some point by
>>             redesigning G1's post barrier but there is not enough time
>>             to do that
>>             for JDK8.
>>
>>             Performance numbers for x86 platforms can be seen here:
>>             http://cr.openjdk.java.net/~__**mgerdin/8014555/perf.txt<http://cr.openjdk.java.net/~__mgerdin/8014555/perf.txt>
>>             <http://cr.openjdk.java.net/~**mgerdin/8014555/perf.txt<http://cr.openjdk.java.net/~mgerdin/8014555/perf.txt>
>> >
>>
>>             Unfortunately the JIRA issue is not externally visible, but
>>             the major
>>             parts of the discussions about this are present in the
>>             mailing list
>>             thread. The bug mostly contains my analysis of the crashes
>>             which seems
>>             to have been caused by this bug.
>>
>>             Bug link: https://bugs.openjdk.java.net/**
>> __browse/JDK-8014555 <https://bugs.openjdk.java.net/__browse/JDK-8014555>
>>             <https://bugs.openjdk.java.**net/browse/JDK-8014555<https://bugs.openjdk.java.net/browse/JDK-8014555>
>> >
>>
>>             Webrev:
>>             http://cr.openjdk.java.net/~__**mgerdin/8014555/webrev.0<http://cr.openjdk.java.net/~__mgerdin/8014555/webrev.0>
>>             <http://cr.openjdk.java.net/~**mgerdin/8014555/webrev.0<http://cr.openjdk.java.net/~mgerdin/8014555/webrev.0>
>> >
>>
>>
>>
>


More information about the hotspot-dev mailing list