RFR: Scan remembered

Roman Kennke rkennke at redhat.com
Wed Jan 13 11:42:45 UTC 2021


Hi Kelvin,

I am getting a little confused now, because you replied to the review 
email, but it's not reflected in GitHub. Well, anyway, let me reply here:


>> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1159:
> 
>>> 1157:     ShenandoahEvacuateUpdateRootsClosure<> cl;
>>> 1158:     MarkingCodeBlobClosure blobsCl(&cl,
>>> CodeBlobToOopClosure::FixRelocations);
>>> 1159:
> 
>> Extra whitespace
> 
> I gather you are referring to the extra blank line here.  I've removed
> it.  I don't see any other extra whitespace.

Yes, this is what I meant. Thanks!

>> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2730:
> 
>>> 2728:         //       phase do not need to be scanned by
>>   update-refs because the to-space invariant is in force
>>> 2729:         //       during evacuation and this will assure
>>   that any objects residing between
>>> 2730:         //       r->get_update_watermark() and r->top()
>>   hold no pointers to from-space.
> 
>> We do need to scan objects that have been allocated by evacuation,
>> because they may hold from-space refs. For this reason, we bump-up
>> UWM when allocating for evac (in ShFreeSet, iirc).
> 
> Thank you for pointing this out.  I did not fully understand how this
> works.  I've studied this a bit more and I have rewritten the comment.
> I would appreciate if you could confirm that my new understanding as
> documented in the updated comment is valid.
> 

Almost. You said:
         //       Note that incrementing watermark to
         //       account for objects newly evacuated into the region 
may result in otherwise unnecessary updating
         //       of references contained within newly allocated objects 
that happen to be located between the
         //       initial value of watermark and the updated value of 
watermark.

This should not happen, because we don't mix evac-allocations with 
regular allocations in the same region.
>> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2720:
> 
>>> 2718:         // an old-gen region as part of an old-gen
>>   collection.  Otherwise, a subseqent update-refs scan of
>>> 2719:         // the same region will see stale pointers and
>>   crash.
>>> 2720:         //
> 
>> This is probably a bad idea. Unreachable objects are not
>> guaranteed to have their Klass* intact, and may crash when
>> accessed. Why would it crash during subsequent scans?
> 
> This aspect of the implementation has evolved a bit from what is
> captured in the current pull request.
> 
> Here's an overview description of the "problem" that can arise:
> 
> 1. A global (or old-gen) collection discovers that certain objects
> within an old-gen card range are dead, but it does not choose the
> enclosing heap region for evacuation (it is not part of the collection
> set).
> 
> 2. Since update refs during the global (old-gen) collection is guided
> by the mark bitmap, references contained within dead objects are not
> updated.
> 
> 3. Assume that at some future time, a live object contained within the
> same card range is overwritten with a reference to young-gen.  This
> has the effect of marking the card.
> 
> 4. Thereafter, at the subsequent young-gen collection, remembered set
> scanning examines all of the objects contained within this card.  It
> does not "know" to skip over the dead objects contained within this
> card.  References contained within the dead objects, which were not
> updated during the previous old-gen collection, now contain garbage
> values and scanning of these references may result in a crash.
> 
> As we incrementally upstream the results of our developments efforts,
> you will eventually see an improved solution that coalesces and fills
> dead objects found between marked objects so that any dead (we call
> them zombie) objects that happen to reside within a card's memory
> range  will not result in problems when we scan the remembered set.
> 
> We are also exploring an additional improvement to this algorithm.
> The idea here is to preserve the mark bitmap from the most recently
> completed global (old-gen) collection and use this to guide remembered
> set scanning.

Ok. A long time ago, we did actually preserve the previous bitmap to 
guide with iterations between cycles and during marking, for example for 
supporting heap-dumps. I believe G1 also does this, and uses the 
previous bitmap to guide old-gen-card-scanning.


>> src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.hpp
>> line 30:
> 
>>> 28:
>>> 29: #include "gc/shared/referenceDiscoverer.hpp"
>>> 30: #include "gc/shared/referencePolicy.hpp"
> 
>> This looks not needed or, if it is actually needed, not related to
>> this change.
> 
> The dependency on these #includes was introduced by one of the other
> commits pulled into the genshen branch prior to this pull request.
> Without these #includes, the code does not compile.
> 
> I have moved the two includes to precede the include of
> shenandoahReferenceProcessor.hpp within shenandoahScanRemembered.hpp.

It should really be added in shenandoahReferenceProcessor.hpp as 
forwarding declaration, and maybe in shenandoahReferenceProcessor.cpp as 
include. Maybe one of you guy can handle this upstream? It's rather 
trivial and should give you some commit-right-creds ;-)

>> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line
>> 2:
> 
>>> 1: /*
>>> 2:  * Copyright (c) Amazon.com, Inc. or its affiliates.  All
>>> rights reserved.
> 
>> You do have the copyright header here, good! :-) But it's lacking
>> the year, like so:
>> * Copyright (c) 2021, Amazon.com, Inc. or its affiliates. All
>> rights reserved.
> 
> Amazon legal/corporate is advising us to omit the date from copyright
> notices.  Is it ok to keep these as is?

I don't think so. All files in OpenJDK have a year in their copyright 
header, and if I understand correctly, it is a legal requirement by US law.

https://en.wikipedia.org/wiki/Copyright_notice#Form_of_notice_for_visually_perceptible_copies

IANAL though. If your lawyers insist, then maybe we should bring it up 
elsewhere in the OpenJDK project?

>> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line
>> 103:
> 
>>> 101: }
>>> 102:
>>> 103: // Implementation is not correct.
>>> ShenandoahBufferWithSATBRememberedSet is a placeholder for
>>> future planned improvements.
> 
>> This should go into its own .cpp file I think. Actually, I am not
>> sure that we should add empty&unused code at all. Maybe add it
>> when you also fill the implementation and use?
> 
> I have removed mention of ShenandoahBufferWithSATBRememberedSet from
> within shenandoahScanRemembered.cpp, removed the
> shenandoahBufferWithSATBRememberedSet.hpp and
> shenandoahBufferWithSATBRememberedSet.inline.hpp source files, and
> removed the #include directives for these two files from the files
> shenandoahScanRemembered.hpp and shenandoahScanRemembered.inline.hpp.
> I have also edited comments in shenandoahScanRemembered.hpp that make
> mention of ShenandoahBufferWithSATBRememberedSet.

Ok, good.

>> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line
>> 212:
> 
>>> 210: // into ShenandoahRootEvacuator::roots_do() before the
>>   invocation of
>>> 211: // _serial_roots.oops_do(oops, worker_id).
>>> 212:
> 
>> This is not quite clear to me. Above you said that all of the work
>> is done during safepoints. Here you mention 'concurrent
>> scanning' ... what is true?
> 
> Thanks for calling this out.  I had written this comment early in the
> development process, before I had a full understanding of how the code
> would integrate within the existing context.
> 
> The remembered set scanning abstractions are designed to support
> concurrent remembered set scanning.  However, our initial
> implementation of genshen does remembered set scanning during a
> safepoint.
> 
> I have revised the comments to better reflect the current reality.

Ok, good!

Alright, I will do another review-pass in Github.

In general, I find it very useful to do all review interactions in 
Github. The advantage is that all comments get closely associated to the 
code, and especially for large reviews Github makes it much easier to 
keep track of everything. If you think you 'resolved' a comment, then 
you can mark it as such, and it would not show up again on my side.

Thanks, Roman



More information about the shenandoah-dev mailing list