RFR: 8225716: G1 GC: Undefined behaviour in G1BlockOffsetTablePart::block_at_or_preceding

Thomas Schatzl thomas.schatzl at oracle.com
Fri Jun 14 12:06:22 UTC 2019


Hi,

On Thu, 2019-06-13 at 16:21 +0100, Andrew Haley wrote:
> The Block Offset Table is accessed racily in many places. This is UB,
> but I suspect the authors of G1 assumed that the UB was "benign". In

Not diluting your point (it is a bug/UB), I think the code simply
assumes volatile access to the offset table given your description
below.

> fact it is not, and there is a narrow race condition which causes G1
> to crash at times of heavy stress.
> 
> The bug has only ever been observed on 86-32, but it isn't a
> target-specific bug. It's probably not specific to G1 either.

The other generational collector use similar code that apparently also
assume volatile access in
BlockOffsetArrayNonContigSpace::block_start_unsafe. While this is not
issues for Serial obviously, it may be for Parallel/CMS and needs to be
looked at too.

> 
> The problem occurs when the BOT is updated by one thread while it's
> being read by another. The BOT isn't volatile, so a C++ compiler is
> entitled to transform this code:
> 
[...]
> But if someone is modifying _offset_array then this code is no longer
> correct. We first read the offset, see that we have to slide back N
> cards, and then we read the (changed) offset and slide back the wrong
> (and huge) nuber of cards, and a crash is the inevitable result. It's
> a matter of luck that this hasn't happened before now.

Great analysis.

> 
> Making the BOT volatile would probably fix this, but IMO it makes
> more sense to make the access explicit:
> 

In addition to Kim's comments I would suggest to do both to make the
intent clear looking at the variable too.

> u_char G1BlockOffsetTable::offset_array(size_t index) const {
>   check_index(index, "index out of range");
>   return Atomic::load(&_offset_array[index]);
> }
> 
> void set_offset_array_raw(size_t index, u_char offset) {
>   Atomic::store(offset, &_offset_array[index]);
> }
> 
> http://cr.openjdk.java.net/~aph/8225716/
> 

Btw, please push the final fix into the jdk/jdk13 repos (I assume we
will manage reviewing this in time ;)). In any case this is serious
enough to get it into the product asap.

Thanks,
  Thomas





More information about the hotspot-gc-dev mailing list