RFR: 8166607: G1 needs klass_or_null_acquire

Fri Sep 30 23:46:20 UTC 2016

> On Sep 30, 2016, at 8:00 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> On Thu, 2016-09-29 at 18:49 -0400, Kim Barrett wrote:
>> Please review this change to G1 to use klass_or_null_acquire where
>> needed.
>> 
>> [Reviewing on hotspot-dev; only GC code is involved, but some non-GC
>> folks have expressed interest.]
>> 
>> G1 BOT uses of klass_or_null have all been changed to instead use
>> klass_or_null_acquire.  This isn't necessary for the call in the
>> revised version of oops_on_card_seq_iterate_careful (discussed
>> below), but there can be (and are) other callers.  The performance
>> impact of unnecessary barriers in oops_on_card_xxx will hopefully not
>> be too bad on non-TSO systems; the number reduces to one as the BOT
>> gets filled in.  Rather than splitting or otherwise conditionalizing,
>> for now we just make it safe.
> 
> Could you add a comment to JDK-8162928?

Done.

>>   [The CollectedHeap block_start &etc API needs some TLC in the face
>> of collectors that use klass_or_null
> 
> TLC?

TLC = Tender Loving Care

The API around CollectedHeap::block_start and friends doesn't deal
very well with the possibility of querying a location within an
in-progress allocation. This is clearly an issue for CMS and G1. CMS
even goes so far as having its own specialized variants of parts that
API, and JDK-8162928 suggests doing similarly for G1.  But even for a
non-concurrent collector it could be an issue; consider
os::print_location, which could be applied to any address at any time.

Descriptions of these functions make no mention of that problem. Some
callers know how to cope, generally when dealing with a specific
collector. Not sure how to cope in generic code; the API might need
rework.

I suppose I should file that as an RFE. 

>> […]
>> Note that some of the complexity in refine_card / oops_on_card_xxx
>> arises from the possibility of "stale" marked cards, especially
>> during concurrent refinement.  These are cards in regions that have
>> been freed and then reallocated.  At present the only source of stale
>> cards in the concurrent case seems to be HCC eviction.  (Eager
>> reclaim of humongous objects flushes relevant remembered sets
>> (possibly containing stale entries) for reconsideration during that
>> GC, but in-GC card clipping to scan_top deals with that.)  Doing HCC
>> cleanup when freeing regions might remove the need for klass_or_null
>> checking in the humongous case for concurrent refinement, so might be
>> worth looking into later.
> 
> CR for that? :)

https://bugs.openjdk.java.net/browse/JDK-8166995
Consider removing stale cards from HCC during cleanup

>> 
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8166607
>> 
>> Webrev:
>> http://cr.openjdk.java.net/~kbarrett/8166607/webrev.01/
>> 
>> Testing:
>> Nightly test equiv. via rbt. (in progress)
>> Local specjbb2015 (non-perf).
> 
> Just mentioning: this change already implements some changes suggested
> in JDK-8162928. So it will improve performance a little.
> 
>> Local jtreg hotspot_all.
>> 
> 
> Some comments:
> 
> - g1BlockOffsetTable.cpp:230: please add braces around the return.

Present code is similar to other nearby code.

> - in the new static method do_oops_on_card_in_humongous(), do you think
> it would be useful to avoid the if-cascading a little by doing early
> returns?

I wrote it somewhat like that earlier, but prefer how it reads now.

> - I think the comment at heapRegion.cpp:411 seems out of place (now?).
> I would rather put it somewhere around line 444.

Good point.  Done.

> - what does "Non-humongous objects only reach the old-gen during GC."
> mean in the comment? Probably this means that "G1 only allocates into
> old gen during GC" which seems much more clear to me.

Good point.  Reworded a bit.

  Non-humongous objects are only allocated in the old-gen during GC.

> Looks good otherwise.

Thanks.