RFR: 8359947: GenShen: use smaller TLABs by default

Wed Jun 18 16:32:08 UTC 2025

On Fri, 23 May 2025 20:13:21 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> We have found with certain workloads that the initial and maximum tlab sizes result in very high latencies for the first few invocations of particular methods for certain threads.  The root cause is that TLABs are too large.  This is causing allocatable memory to be depleted too quickly.  When large numbers of threads are trying to startup at the same time, some of the threads end up with no TLABs or very small TLABs and their efforts run hundreds of times slower than the threads that were able to grab very large TLABs.
> 
> This PR reduces the maximum TLAB size and adjusts the initial TLAB size in order to reduce the impact of this problem.
> 
> This PR also changes the value of TLABAllocationWeight from 90 to 35 when we are running in generational mode.  35 is the default value used for G1 GC, which is also generational.  The default value of 90 was established years ago for non-generational Shenandoah because it tends to have less frequent GC cycles than generational collectors.
> 
> We have exercised this PR with three different workloads, which we identify as small, medium, and huge.  We have also exercised in two different configurations: with and without 30s warmup before latency measurements are taken.  Finally, we have applied this PR to both tip and to a development branch identified as adaptive-evac-with-surge.
> 
> The initial motivation for this PR was identified during testing of the adaptive-evac-with-surge branch.  That branch runs more aggressive GCs (larger evacuation workloads, with delayed (slightly more risky) triggers.  The objectives of this branch are to make GCs more efficient and to reduce CPU consumption.
> 
> We report 6 results for each experiment.  We sort these according to P100 latencies, and average results from the bottom four (best performing) samples, tossing out the two high outliers from the averages.  Workload results are subject to noise from elastic computing and operating system interference.
> 
> The benefits of this PR are most notable with the p99.999 and p100 small configuration of adaptive-evac-with-surge and the huge configuration of tip: 
> 
> ![image](https://github.com/user-attachments/assets/def49a3c-4142-48f7-a946-33527e6985d0)
> 
> ![image](https://github.com/user-attachments/assets/b0df27b3-f7b0-4fd2-82c3-ac84b0ad380e)
> 
> ![image](https://github.com/user-attachments/assets/471c1292-96dc-46c1-9bcc-b851be07867d)
> 
> Note also the degradation in p50 and other lower percentile latencies.  The effect of this PR is to require each mutator threa...

Leaving this in draft while I prepare details for review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25423#issuecomment-2905742286