RFR: 8225716: G1 GC: Undefined behaviour in G1BlockOffsetTablePart::block_at_or_preceding

Andrew Haley aph at redhat.com
Fri Jun 14 13:44:25 UTC 2019


On 6/14/19 1:06 PM, Thomas Schatzl wrote:
> Hi,
> 
> On Thu, 2019-06-13 at 16:21 +0100, Andrew Haley wrote:
>> The Block Offset Table is accessed racily in many places. This is UB,
>> but I suspect the authors of G1 assumed that the UB was "benign". In
> 
> Not diluting your point (it is a bug/UB), I think the code simply
> assumes volatile access to the offset table given your description
> below.

Yes, I think it does.

>> fact it is not, and there is a narrow race condition which causes G1
>> to crash at times of heavy stress.
>>
>> The bug has only ever been observed on 86-32, but it isn't a
>> target-specific bug. It's probably not specific to G1 either.
> 
> The other generational collector use similar code that apparently also
> assume volatile access in
> BlockOffsetArrayNonContigSpace::block_start_unsafe. While this is not
> issues for Serial obviously, it may be for Parallel/CMS and needs to be
> looked at too.

Right.

>> The problem occurs when the BOT is updated by one thread while it's
>> being read by another. The BOT isn't volatile, so a C++ compiler is
>> entitled to transform this code:
>>
> [...]
>> But if someone is modifying _offset_array then this code is no longer
>> correct. We first read the offset, see that we have to slide back N
>> cards, and then we read the (changed) offset and slide back the wrong
>> (and huge) nuber of cards, and a crash is the inevitable result. It's
>> a matter of luck that this hasn't happened before now.
> 
> Great analysis.

Thank you. I've been looking at this code as a background task for
several weeks.

I figured out some time ago that racy access to the BOT needed fixing
and that doing so made the crash go away, but I still could not be
sure that fixing the race really fixed the root cause. Maybe all that
the change (volatile access to te BOT) did was to make the bug happen
less frequently? There was no way to know for sure without finding the
bad code.

>> Making the BOT volatile would probably fix this, but IMO it makes
>> more sense to make the access explicit:
> 
> In addition to Kim's comments I would suggest to do both to make the
> intent clear looking at the variable too.

Sure. It wouldn't hurt.

> Btw, please push the final fix into the jdk/jdk13 repos (I assume we
> will manage reviewing this in time ;)). In any case this is serious
> enough to get it into the product asap.

OK.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



More information about the hotspot-gc-dev mailing list