[jdk17u-dev] RFR: 8312182: THPs cause huge RSS due to thread start timing issue

Thomas Stuefe stuefe at openjdk.org
Thu Aug 24 16:15:28 UTC 2023


On Thu, 24 Aug 2023 13:44:39 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Friendly ping. It would be good to get this fixed in time for the next CPU.
>
>> Friendly ping. It would be good to get this fixed in time for the next CPU.
> 
> I have concerns about this, for several reasons:
> 1. The fix has a considerable bugtail _in the product code_, which does not inspire confidence there would not be more soon;
> 2. The fix is enabled by default, which means any bugs in it would be exposed to users immediately;
> 3. The review is hard, because it cobbles 7 commits together, so it would take a while to disentangle if any mistakes crept in (note: the usual way to deal with this problem is to propose the PR with a commit per issue, so we can review individual commits);
> 
> This also does not look like a recent regression, but rather a long-standing bug, right?
> So, is there a reason to rush it for 17.0.9 in October?

@shipilev Thanks for looking at this. I debated with myself for a long time whether this was the right approach. I did not choose to build a composite patch out of laziness (if anything, downporting the issues separately and verbatim would have been much simpler, albeit slower).

By providing a minimal (not cobbled together but carefully selected) patch I minimize the problem surface because I leave out code that have nothing to do with the goal of this patch: the static hugepage detection of [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) "Fix THP detection on Linux").

> * The fix has a considerable bugtail _in the product code_, which does not inspire confidence there will not be more soon;

we have two trailing bugs in product code:
a) [JDK-8312394](https://bugs.openjdk.org/browse/JDK-8312394) "[linux] SIGSEGV if kernel was built without hugepage support"
b) [JDK-8312620](https://bugs.openjdk.org/browse/JDK-8312620) "WSL Linux build crashes after JDK-8310233"

(a) JDK-8312394 is of no concern since it only affects code I explicitly left out from the patch (it does not affect THPs). It would be a concern were I to downport patches individually and verbatim.

That leaves (b), which is a bug in a super obscure context (building on Windows with WSL that carries an arguably broken Linux kernel). One bug is not a very long tail.

> 
> * The fix is enabled by default, which means any bugs in it would be exposed to users immediately;
> 

I think this patch is just not that risky.

> * The review is hard, because it cobbles 7 commits together, so it would take a while to disentangle if any mistakes crept in (note: the usual way to deal with this problem is to propose the PR with a commit per issue, so we can review individual commits);

But that process would carry more risk since I would have to downport unnecessary parts and have time windows with unfixed bugs.

This also shows a shortcoming of the review process. If I downport stuff verbatim, reviews are simple since they are mostly mechanical mental diffing; but that is not the most ideal patch, which would be one that is small and confined.

> 
> This also does not look like a recent regression, but rather a long-standing bug, right? So, is there a reason to rush it for 17.0.9 in October?

We have customers running with THP enabled always and unwilling or unable to change that; they would be happy about a fix. 

I'm fine with postponing this patch (not that I have any choice since I lack reviews and the window is almost closed). But the whole discussion leaves me dissatisfied with the practice of downporting whole patch trees to get a single issue fixed. We recently had similar discussions when downporting https://github.com/openjdk/jdk11u-dev/pull/2035 which ended up a far bigger change than necessary, carrying a lot of code for the sole purpose of keeping a low delta to upstream.

-------------

PR Comment: https://git.openjdk.org/jdk17u-dev/pull/1679#issuecomment-1691994196


More information about the jdk-updates-dev mailing list