RFR(S): 8170409: CMS: Crash in CardTableModRefBSForCTRS::process_chunk_boundaries

Thomas Schatzl thomas.schatzl at oracle.com
Tue Nov 29 10:43:30 UTC 2016


Hi,

On Mon, 2016-11-28 at 19:33 +0100, Volker Simonis wrote:
> Hi,
> 
> can I please have a review and sponsor for the following fix
> submitted
> by gunter.haug at sap.com:
> 
> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170409/
> https://bugs.openjdk.java.net/browse/JDK-8170409
> 
> We've observed a crash (see bug report for a stack trace) in
> CardTableModRefBSForCTRS::process_chunk_boundaries() from time to
> time since several years now, but only on non TSO platforms:
> 
> - It only happens in opt builds.
> - Analysis of the assembly code revealed the actual crash site to be
> an array store to a pointer (_lowest_non_clean) which is an argument
> to process_chunk_boundaries()
> - The pointer is actually calculated in
> CardTableModRefBS::get_LNC_array_for_space() and passed as argument
> to
> CardTableModRefBSForCTRS::process_chunk_boundaries()
> - CardTableModRefBS::get_LNC_array_for_space() doesn't enforce TSO on
> _last_LNC_resizing_collection[i] so the pointer to an uninitialized
> structure (i.e._lowest_non_clean) could become visible to other
> threads before the value of _last_LNC_resizing_collection[i].
> 
> Solution:
> 
> Use OrderAccess::load_acquire and OrderAccess::release_store for
> accessing _last_LNC_resizing_collection[i] in
> CardTableModRefBSForCTRS::get_LNC_array_for_space()
> 

  looks good to me. Nice catch.

During my review I did look at other uses of
_last_LNC_resizing_collection and friends, aparently no issues there.
I also looked at what parallel gc does, and while it implements the
same idea of slicing the card table in chunks for parallel card table
scan, it does not need or use the LNC tables (of course). That
implementation also does not seem to use any similar shared data
structure, so it should be good (in
CardTableExtension::scavenge_contents_parallel).

One comment: maybe the "Therefore use acquire/release to guarantee this
in non TSO architectures." sentence in the added comment could be
removed.

Adding appropriate code so that collectors work on supported
architectures seems to be an obvious thing to do at this point, and
should not need to be argued for.

However you can leave that sentence in if you really want. I do not
need a re-review if you removed it before pushing either.

Thanks,
  Thomas




More information about the hotspot-gc-dev mailing list