RFR: Scan remembered

Nilsen, Kelvin kdnilsen at amazon.com
Wed Jan 13 02:49:43 UTC 2021


Thank you again for the prompt review of my prior pull request.  I
have attempted to respond to the various issues and have pushed a new
commit to the pull request.

On 1/11/21, 2:59 AM, "shenandoah-dev on behalf of Roman Kennke"
<shenandoah-dev-retn at openjdk.java.net on behalf of
rkennke at openjdk.java.net> wrote:

>src/hotspot/share/gc/shenandoah/shenandoahBufferWithSATBRememberedSet.hpp
>line 1:

>> 1: //

>New file lacks copyright notice.

Fixed.

>src/hotspot/share/gc/shenandoah/shenandoahBufferWithSATBRememberedSet.hpp
>line 17:

>> 15: //      a) Just blindly dirty each card that is overwritten
>  with a pointer value, regardless of the written value, as with
>> 16: //         the implementation of traditional direct card
>  marking.  When this card's memory region is eventually scanned,
>  the
>> 17: //         the implementation of remembered set scanning
>  will clear the page if it no longer holds references to
>  young-gen

>Minor: double- 'the'

Fixed.

>src/hotspot/share/gc/shenandoah/shenandoahBufferWithSATBRememberedSet.hpp
>line 61:

>> 59: // existing SATB pre-write barrier to maintain remembered
>  set information rather than using unconditional direct card
>  marking in
>> 60: // a post-write barrier.
>> 61: class ShenandoahBufferWithSATBRememberedSet: public
>  CHeapObj<mtGC> {

>So, if I understand correctly, you intend to piggy-back the
>field-address onto the SATB queue, is that right? It would mean
>that we need to double the size of the items in the buffer. And it
>would also mean that SATB needs to be turned on all the time, not
>just during marking. Is that correct? If that is so, I wonder if
>it may be better to do like G1 does and use a separate queue and
>barrier for this? It looks to me that it is doing exactly what we
>need here.

I sent a separate email to explain my thoughts regarding this.  This
is an "experimental" feature to be implemented if and when time
permits.

>src/hotspot/share/gc/shenandoah/shenandoahBufferWithSATBRememberedSet.inline.hpp
>line 1:

>> 1:

>Missing copyright header too.
>  And header #ifndef-#define-#endif ;-)

I've removed this source file.  Will add it back in once implemented.

>src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1159:

>> 1157:     ShenandoahEvacuateUpdateRootsClosure<> cl;
>> 1158:     MarkingCodeBlobClosure blobsCl(&cl,
>> CodeBlobToOopClosure::FixRelocations);
>> 1159:

>Extra whitespace

I gather you are referring to the extra blank line here.  I've removed
it.  I don't see any other extra whitespace.

>src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2730:

>> 2728:         //       phase do not need to be scanned by
>  update-refs because the to-space invariant is in force
>> 2729:         //       during evacuation and this will assure
>  that any objects residing between
>> 2730:         //       r->get_update_watermark() and r->top()
>  hold no pointers to from-space.

>We do need to scan objects that have been allocated by evacuation,
>because they may hold from-space refs. For this reason, we bump-up
>UWM when allocating for evac (in ShFreeSet, iirc).

Thank you for pointing this out.  I did not fully understand how this
works.  I've studied this a bit more and I have rewritten the comment.
I would appreciate if you could confirm that my new understanding as
documented in the updated comment is valid.


>src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2720:

>> 2718:         // an old-gen region as part of an old-gen
>  collection.  Otherwise, a subseqent update-refs scan of
>> 2719:         // the same region will see stale pointers and
>  crash.
>> 2720:         //

>This is probably a bad idea. Unreachable objects are not
>guaranteed to have their Klass* intact, and may crash when
>accessed. Why would it crash during subsequent scans?

This aspect of the implementation has evolved a bit from what is
captured in the current pull request.  

Here's an overview description of the "problem" that can arise:

1. A global (or old-gen) collection discovers that certain objects
within an old-gen card range are dead, but it does not choose the
enclosing heap region for evacuation (it is not part of the collection
set).

2. Since update refs during the global (old-gen) collection is guided
by the mark bitmap, references contained within dead objects are not
updated. 

3. Assume that at some future time, a live object contained within the
same card range is overwritten with a reference to young-gen.  This
has the effect of marking the card. 

4. Thereafter, at the subsequent young-gen collection, remembered set
scanning examines all of the objects contained within this card.  It
does not "know" to skip over the dead objects contained within this
card.  References contained within the dead objects, which were not
updated during the previous old-gen collection, now contain garbage
values and scanning of these references may result in a crash.

As we incrementally upstream the results of our developments efforts,
you will eventually see an improved solution that coalesces and fills
dead objects found between marked objects so that any dead (we call
them zombie) objects that happen to reside within a card's memory
range  will not result in problems when we scan the remembered set.

We are also exploring an additional improvement to this algorithm.
The idea here is to preserve the mark bitmap from the most recently
completed global (old-gen) collection and use this to guide remembered
set scanning.

>src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2741:

>> 2739:         // allocate from within a free region.
>> 2740:         //
>> 2741:         // Reality is that this code is likely to be
>> replaced with JVM-292 code before we ever get around to

>What is 'JVM-292' code? ;-)

I have replaced this comment to avoid reference to internal ticket number.

>src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 71:

>> 69: class VMStructs;
>> 70:
>> 71:

>Extra WS

I have removed one of the two blank lines.

>src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 40:

>> 38: #include "gc/shenandoah/shenandoahScanRemembered.hpp"
>> 39: #include "memory/metaspace.hpp"
>> 40: #include "gc/shenandoah/shenandoahScanRemembered.hpp"

>You're including the same header 2x

Thanks for catching this.  I think this came in through a manual merge
conflict.  I've removed one of the includes.

>src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 49:

>> 47: class ShenandoahCollectorPolicy;
>> 48: class ShenandoahControlThread;
>> 49: class ShenandoahDirectCardMarkRememberedSet;

>This looks unused.

Agreed.  Removed.

>src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.hpp
>line 30:

>> 28:
>> 29: #include "gc/shared/referenceDiscoverer.hpp"
>> 30: #include "gc/shared/referencePolicy.hpp"

>This looks not needed or, if it is actually needed, not related to
>this change.

The dependency on these #includes was introduced by one of the other
commits pulled into the genshen branch prior to this pull request.
Without these #includes, the code does not compile.

I have moved the two includes to precede the include of
shenandoahReferenceProcessor.hpp within shenandoahScanRemembered.hpp.


>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line
>2:

>> 1: /*
>> 2:  * Copyright (c) Amazon.com, Inc. or its affiliates.  All
>> rights reserved.

>You do have the copyright header here, good! :-) But it's lacking
>the year, like so:
> * Copyright (c) 2021, Amazon.com, Inc. or its affiliates. All
>rights reserved.

Amazon legal/corporate is advising us to omit the date from copyright
notices.  Is it ok to keep these as is?

>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line
>2:

>> 1: /*
>> 2:  * Copyright (c) Amazon.com, Inc. or its affiliates.  All
>> rights reserved.

>Missing copyright year, as above.

See above.

>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line
>103:

>> 101: }
>> 102:
>> 103: // Implementation is not correct.
>> ShenandoahBufferWithSATBRememberedSet is a placeholder for
>> future planned improvements.

>This should go into its own .cpp file I think. Actually, I am not
>sure that we should add empty&unused code at all. Maybe add it
>when you also fill the implementation and use?

I have removed mention of ShenandoahBufferWithSATBRememberedSet from
within shenandoahScanRemembered.cpp, removed the
shenandoahBufferWithSATBRememberedSet.hpp and
shenandoahBufferWithSATBRememberedSet.inline.hpp source files, and
removed the #include directives for these two files from the files
shenandoahScanRemembered.hpp and shenandoahScanRemembered.inline.hpp.
I have also edited comments in shenandoahScanRemembered.hpp that make
mention of ShenandoahBufferWithSATBRememberedSet.

>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line
>90:

>> 88:
>> 89: // The typical parallel remembered set scanning effort
>> consists of the
>> 90: // following steps, all of which are performed during a JVM
>> safetpoint:

>Typo: safetpoint -> safepoint.

Fixed.

>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line
>212:

>> 210: // into ShenandoahRootEvacuator::roots_do() before the
>  invocation of
>> 211: // _serial_roots.oops_do(oops, worker_id).
>> 212:

>This is not quite clear to me. Above you said that all of the work
>is done during safepoints. Here you mention 'concurrent
>scanning' ... what is true?

Thanks for calling this out.  I had written this comment early in the
development process, before I had a full understanding of how the code
would integrate within the existing context.

The remembered set scanning abstractions are designed to support
concurrent remembered set scanning.  However, our initial
implementation of genshen does remembered set scanning during a
safepoint. 

I have revised the comments to better reflect the current reality. 

>src/hotspot/share/gc/shenandoah/shenandoahBufferWithSATBRememberedSet.hpp
>line 61:

>> 59: // existing SATB pre-write barrier to maintain remembered
>  set information rather than using unconditional direct card
>  marking in
>> 60: // a post-write barrier.
>> 61: class ShenandoahBufferWithSATBRememberedSet: public
>  CHeapObj<mtGC> {

>It's also lacking the usual #ifndef - #define - #endif sequence
>that is needed by headers.

I have removed this source file from the commit.

>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line
>26:

>> 24:  */
>> 25:
>> 26: // Terminology used within this source file:

>We usually place big comments like this below #includes and just
>above class declarations.

Ok.  Moved.

>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line
>787:

>> 785:   // What card number corresponds to old-gen heap addresss
>  p.  (If p
>> 786:   // does not refer to old-gen memory, the returned value
>  is undefined.)
>> 787:   uint32_t cardNoForAddr(HeapWord *p);

>Should be named card_no_for_addr(HeapWord* p) to match style of
>most other Hotspot code.

I have removed the cardNoForAddr prototype.  It was accidentally
leftover from previous editing.  It has been replaced with
card_index_for_addr(), already present in the same source file.


>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.inline.hpp
>line 2:

>> 1: /*
>> 2:  * Copyright (c) Amazon.com, Inc. or its affiliates.  All
>> rights reserved.

>Missing copyright year.

Intentional.  See above.

>src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line
>26:

>> 24:  */
>> 25:
>> 26:

>Make sure you're adding #include "precompiled.hpp" and make sure
>it builds with and without --disable-precompiled-headers

I've added "precompiled.hpp" and have successfully built both ways.

>-------------

>Changes requested by rkennke (Reviewer).

>PR: https://git.openjdk.java.net/shenandoah/pull/12



More information about the shenandoah-dev mailing list