[jdk17u-dev] RFR: 8312182: THPs cause huge RSS due to thread start timing issue

Thomas Stuefe stuefe at openjdk.org
Fri Aug 25 05:44:25 UTC 2023


On Mon, 21 Aug 2023 12:47:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Unclean composite backport to jdk17u. Fixes JDK-8312182 - "THPs cause huge RSS due to thread start timing issue" (https://bugs.openjdk.org/browse/JDK-8312182)
> 
> Problem:
> 
> On a machine with transparent huge pages (THP) unconditionally enabled (/sys/kernel/mm/transparent_hugepage/enabled = "always"), the JVM may show a huge memory footprint (RSS) and degraded thread start performance.
> 
> The following factors make the problem more severe and more likely:
> - thread stack size of 2M (on arm64 or x64) or larger
> - many threads, or high thread creation churn
> - a slow or overloaded machine (since part of the problem is timing-dependent)
> 
> For a detailed discussion of the underlying problem, please see https://github.com/openjdk/jdk/pull/14919.
> 
> ----------------
> 
> In jdk Head, the issue got fixed with a sequence of patches:
> 
> - [JDK-8303215](https://bugs.openjdk.org/browse/JDK-8303215) "Make thread stacks not use huge pages"
> - [JDK-8312182](https://bugs.openjdk.org/browse/JDK-8312182) "THPs cause huge RSS due to thread start timing"
> 
> However, JDK-8312182 itself needed one preparatory fix:
> - [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) "Fix THP detection on Linux"
> 
> and then we had several corner-case test problems which are fixed with:
> - [JDK-8312394](https://bugs.openjdk.org/browse/JDK-8312394) "[linux] SIGSEGV if kernel was built without hugepage support"
> - [JDK-8312620](https://bugs.openjdk.org/browse/JDK-8312620) "WSL Linux build crashes after JDK-8310233"
> - [JDK-8314139](https://bugs.openjdk.org/browse/JDK-8314139) "TEST_BUG: runtime/os/THPsInThreadStackPreventionTest.java could fail on machine with large number of cores"
> 
> and finally, we decided to rename the switch that allows to switch off the THP mitigation with a final patch:
> - [JDK-8312585](https://bugs.openjdk.org/browse/JDK-8312585) "Rename DisableTHPStackMitigation flag to THPStackMitigation"
> 
> 
> Instead of downporting these 7 patches verbatim, I prepared a composite patch containing only the necessary mitigation and mitigation tests.
> 
> This is similar to the [jdk11u downport](https://github.com/openjdk/jdk11u-dev/pull/2086), but in jdk17u, [JDK-8303215](https://bugs.openjdk.org/browse/JDK-8303215) had been already backported. Therefore there are some minor differences.
> 
> This patch does:
> - make sure that all thread stacks have at least one glibc guard page to prevent clustering of adjacent thread stacks into one VMA
> - change the default size of stacks to be not aligned to 2MB to prevent intra-stack...

okay, I'll withdraw. Let's do it piece by piece as usual.

-------------

PR Comment: https://git.openjdk.org/jdk17u-dev/pull/1679#issuecomment-1692791488


More information about the jdk-updates-dev mailing list