RFR: JDK-8312182: THPs cause huge RSS due to thread start timing issue [v2]
David Holmes
dholmes at openjdk.org
Thu Jul 20 06:14:47 UTC 2023
On Wed, 19 Jul 2023 16:52:02 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> If Transparent Huge Pages are unconditionally enabled (`/sys/kernel/mm/transparent_hugepage/enabled` contains `[always]`), Java applications that use many threads may see a huge Resident Set Size. That RSS is caused by thread stacks being mostly paged in. This page-in is caused by thread stack memory being transformed into huge pages by `khugepaged`; later, those huge pages usually shatter into small pages when Java guard pages are established at thread start, but the remaining splinter small pages remain paged in.
>>
>> [JDK-8303215](https://bugs.openjdk.org/browse/JDK-8303215) attempted to fix this problem by making it unlikely that thread stack boundaries are aligned to THP page size. Unfortunately, that was not sufficient. We still see JVMs with huge footprints, especially if they did create many Java threads in rapid succession.
>>
>> Note that this effect is independent of any JVM switches; in particular, it happens regardless of `-XX:+UseTransparentHugePages` or `-XX:+UseLargePages`.
>>
>> Update: tests show that the interference of `khugepaged` also costs performance when starting threads, and this patch addresses both footprint and performance problems.
>>
>> ##### Demonstration:
>>
>> Linux 5.15 on x64, glibc 2.31: 10000 idle threads with 100 MB pre-touched java heap, `-Xss2M`, on x64, will consume:
>>
>> A) Baseline (THP disabled on system): *369 MB*
>> B) THP="always", JDK-8303215 present: *1.5 GB .. >2 GB* (very wobbly)
>> C) THP="always", JDK-8303215 present, artificial delay after thread start: **20,6 GB** (!).
>>
>>
>> #### Cause:
>>
>> The problem is caused by timing. When we create multiple Java threads, the following sequence of actions happens:
>>
>> In the parent thread:
>> - the parent thread calls `pthread_create(3)`
>> - `pthread_create(3)` creates the thread stack by calling `mmap(2)`
>> - `pthread_create(3)` calls `clone(2)` to start the child thread
>> - repeat to start more threads
>>
>> Each child thread:
>> - queries its stack dimensions
>> - handshakes with the parent to signal lifeness
>> - establishes guard pages at the low end of the stack
>>
>> The thread stack mapping is established in the parent thread; the guard pages are placed by the child threads. There is a time window in which the thread stack is already mapped into address space, but guard pages still need to be placed.
>>
>> If the parent is faster than the children, it will have created mappings faster than the children can place guard pages on them.
>>
>> For the kernel, these t...
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> Review feedback
src/hotspot/os/linux/globals_linux.hpp line 94:
> 92: product(bool, PreventTHPsForThreadStacks, true, EXPERIMENTAL, \
> 93: "If true, the JVM will attempt to prevent formation of " \
> 94: "transparent huge pages in thread stacks.") \
Can we expand this to state that it can be turned off by ergonomics. Or better yet perhaps this should be off by default and potentially turned on by ergonomics?
src/hotspot/os/linux/os_linux.cpp line 921:
> 919: if (PreventTHPsForThreadStacks) {
> 920: // In addition to the glibc guard page that prevents inter-thread-stack hugepage
> 921: // coalescation (see comment in os::Linux::default_guard_size()), we also make
s/coalescation/coalescing/
src/hotspot/os/linux/os_linux.cpp line 3104:
> 3102: //
> 3103: // Yes, this means we have two guard sections - the glibc and the JVM one - per thread. But the
> 3104: // cost for that one extra protected page is dwarfed a large win in performance and memory that
s/dwarfed a/dwarfed by a/
src/hotspot/os/linux/os_linux.cpp line 3105:
> 3103: // Yes, this means we have two guard sections - the glibc and the JVM one - per thread. But the
> 3104: // cost for that one extra protected page is dwarfed a large win in performance and memory that
> 3105: // avoiding interference by khugepaged buys us.
s/by/from/
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14919#discussion_r1268975606
PR Review Comment: https://git.openjdk.org/jdk/pull/14919#discussion_r1268969342
PR Review Comment: https://git.openjdk.org/jdk/pull/14919#discussion_r1268972516
PR Review Comment: https://git.openjdk.org/jdk/pull/14919#discussion_r1268972590
More information about the hotspot-runtime-dev
mailing list