RFR: JDK-8260332: ParallelGC: Cooperative pretouch for oldgen expansion [v2]
Amit Pawar
github.com+71302734+amitdpawar at openjdk.java.net
Tue Mar 16 08:35:08 UTC 2021
On Mon, 15 Mar 2021 02:13:46 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> Amit Pawar has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Fixed build issues for some targets and updated with suggested changes.
>
> Changes requested by kbarrett (Reviewer).
>
>
> _Mailing list message from [Kim Barrett](mailto:kim.barrett at oracle.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_
>
> > On Mar 12, 2021, at 3:01 PM, Amit Pawar <github.com+71302734+amitdpawar at openjdk.java.net> wrote:
> > In case of ParallelGC, oldgen expansion can happen during promotion. Expanding thread will touch the pages and can't request for task execution as this GC thread is already executing a task. The expanding thread holds the lock on "ExpandHeap_lock" to resize the oldgen and other threads may wait for their turn. This is a blocking call.
> > This patch changes this behavior by adding another constructor in "MutexLocker" class to enable non blocking or try_lock operation. This way one thread will acquire the lock and other threads can join pretouch work. Threads failed to acquire the lock will join pretouch only when task is marked ready by expanding thread.
> > Following minimum expansion size are seen during expansion.
> > 1. 512KB without largepages and without UseNUMA.
> > 2. 64MB without largepages and with UseNUMA,
> > 3. 2MB (on x86) with large pages and without UseNUMA,
> > 4. 64MB without large pages and with UseNUMA.
> > When Oldgen is expanding repeatedly with smaller size then this change wont help. For such cases, resize size should adapt to application demand to make use of this change. For example if application nature triggers 100 expansion with smaller sizes in same GC then it is better to increase the expansion size during each resize to reduce the number of resizes. If this patch is accepted then will plan to fix this case in another patch.
>
> Sorry, but a change like this needs better motivation. What you say
> above suggests this change doesn't actually help.
>
> It's intentional that oldgen expansions aren't generally large, as the
> oldgen shouldn't be grown unnecessarily. There are already parameters
> such as MinHeapDeltaBytes to control and manipulate this.
>
> It is also preferable to complete an expansion request quickly to make
> the additional space available to other threads in the main allocation
> path, rather than making them go to the expand path. Making expansions
> larger could force more threads to take the slower expand path, which
> doesn't seem like a win even if they then help with the pretouch part
> of another thread's expansion. (And that also assumes UsePreTouch is
> even enabled.)
>
> So the followup change that you say is needed to make this one
> profitable seems questionable.
>
> The proposed change is also surprisingly large and intrusive for
> something that seems like it should be very localized.
>
> > Jtreg all test passed.
>
> A change like this needs a lot more testing than that, both functionally
> and performance.
[https://bugs.openjdk.java.net/browse/JDK-8254699](JDK-8254699) contains test results in XL file to show PreTouchParallelChunkSize was recently changed from 1GB to 4MB on Linux after testing various sizes. I have downloaded the same XL file and same is updated for Oldgen case during resize and it gives some rough idea about the improvement for this fix and follow up fix. Please check "PretouchOldgenDuringResize" sheet for "Co-operative Fix" and "Adaptive Resize Fix" columns.
[PreTouchParallelChunkSize_TestResults_UpdatedForOldGenCase.xlsx](https://github.com/openjdk/jdk/files/6147180/PreTouchParallelChunkSize_TestResults_UpdatedForOldGenCase.xlsx)
Running SPECJbb composite shows 30-40% reduction in GC pause time when old-gen expands upto ~1GB for UseNUMA case (I tested for minimum oldgen size to trigger the resize). Non UseNUMA case will show improvement only when resize expands more than minimum "PreTouchParallelChunkSize" size to let other thread participate in pretouch work. Two cases 1 & 3 (without UseNUMA and with/without hugpages) above didnt shows any improvement because of "PreTouchParallelChunkSize" limit and that is the reason why I suggested another fix.
The "Adaptive Resize Fix" column in the sheet is for next suggested fix and may possibly help to improve further. For server JVM, expansion size of 512KB, 2MB (hugepages) and 64MB looks good for first resize but later needs some attention I think. JVM flag "MinHeapDeltaBytes" needs to be known by the user and need to set it upfront. I think this can be consider for first resize in every GC and later dynamically go for higher size like double the previous size to adopt to application nature. This way it may help to reduce the GC pause time during the expansion. I thought to share my observation and my understanding could be wrong. So please check and suggest.
Again, thanks for your feedback.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2976
More information about the hotspot-dev
mailing list