RFR: 8308766: TLAB initialization may cause div by zero

Wed May 31 19:28:25 UTC 2023

On Wed, 24 May 2023 11:50:02 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   can I have reviews for this change that fixes an FP div by zero?
> 
> In `ThreadLocalAllocBuffer::initialize()` we initialize the TLAB using current available TLAB capacity for the thread. In G1, this can be zero in some situations, leading to that div by zero (see the CR for the crash when adding an assert).
> The suggested fix is to just not sample at this point. TLAB resizing will fix TLAB sizing up.
> 
> Only G1 seems to be affected as it seems to be the only gc that uses a dynamic value for the capacity available for TLAB allocation. Other GCs seem to just use total heap capacity (Z, Shenandoah) or eden capacity (Serial, Parallel).
> Not sure if that is actually better and I think won't result in the expected behavior (every thread should reload TLABs `target_refills()` times per mutator time); since even with G1 at TLAB resizing time eden is maximal, this hiccup at initialization does not seem too bad.
> 
> This may also be the cause for the behavior observed in https://bugs.openjdk.org/browse/JDK-8264798. 
> 
> Testing: gha
> 
> Thanks,
>   Thomas

Thanks to Thomas' explanation, now I understand why it tracks the ratio instead of the actual alloc-amount. It's because (eden) capacity affects the distance btw two gc-pause (in STW GC), and alloc-amount is semi-proportional to gc-distance. Therefore, the ratio more or less reflects alloc-rate, which can be used to predict alloc-amount until the next gc-pause.

However, maintaining a constant number of refills btw gc-pauses seems an odd objective; preexisting issue.

-------------

Marked as reviewed by ayang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14121#pullrequestreview-1454004997