RFR: 8308766: TLAB initialization may cause div by zero

Thu May 25 14:14:01 UTC 2023

On Wed, 24 May 2023 11:50:02 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   can I have reviews for this change that fixes an FP div by zero?
> 
> In `ThreadLocalAllocBuffer::initialize()` we initialize the TLAB using current available TLAB capacity for the thread. In G1, this can be zero in some situations, leading to that div by zero (see the CR for the crash when adding an assert).
> The suggested fix is to just not sample at this point. TLAB resizing will fix TLAB sizing up.
> 
> Only G1 seems to be affected as it seems to be the only gc that uses a dynamic value for the capacity available for TLAB allocation. Other GCs seem to just use total heap capacity (Z, Shenandoah) or eden capacity (Serial, Parallel).
> Not sure if that is actually better and I think won't result in the expected behavior (every thread should reload TLABs `target_refills()` times per mutator time); since even with G1 at TLAB resizing time eden is maximal, this hiccup at initialization does not seem too bad.
> 
> This may also be the cause for the behavior observed in https://bugs.openjdk.org/browse/JDK-8264798. 
> 
> Testing: gha
> 
> Thanks,
>   Thomas

> Looking at where `_allocation_fraction` is accessed, wouldn't a variable capacity cause the alloc-amount to be miscalculated? I'd expect `capacity` to be const to more accurately track/predict #alloc-bytes.
> 
> ```c++
>   // ** sampling place ** //
>   size_t capacity = Universe::heap()->tlab_capacity(thread()) / HeapWordSize;
>   float alloc_frac = desired_size() * target_refills() / (float)capacity;
>   _allocation_fraction.sample(alloc_frac);
> 
>   // ** where it's used ** //
>   // Compute the next tlab size using expected allocation amount
>   size_t alloc = (size_t)(_allocation_fraction.average() *
>                           (Universe::heap()->tlab_capacity(thread()) / HeapWordSize));
> ```

Where the capacity is used, during the GC pause, in G1 `Universe::heap()->tlab_capacity` is effectively a constant, reflecting the eden size for the next mutator phase.

There is a problem what to do during TLAB initialization when attaching a random thread: eden can be partially exhausted as it can happen at any time when the mutator is running: do you want to have `target_refills()` reloads until eden is exhausted, or as if the thread ran since the start of the mutator phase or something completely different.

Serial and parallel calculate it as if eden were empty, Shenandoah and Z seem to use total heap capacity (they're single-generational), and G1 uses the remaining eden capacity, with different effects.

(Fwiw, if there is an issue with that logic, it is pre-existing).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14121#issuecomment-1562981802