RFR: 8324580: SIGFPE on THP initialization on kernels < 4.10 [v4]
Thomas Stuefe
stuefe at openjdk.org
Tue Feb 6 07:32:54 UTC 2024
On Mon, 5 Feb 2024 10:44:09 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:
>> MacOS error unrelated.
>> @kstefanj could you take a look? This one is rather simple. Thanks!
>
> Thanks for the ping @tstuefe =)
>
> I think the approach to use the default large page size is good, but I think we should try to add the logic to `THPSupport::scan_os()` so it returns with a reasonable page size set. This will make the logging show the correct page size as well. If I'm not mistaken, right now the log will still render 0 as the THP page size:
>
> [0.005s][info][pagesize] Transparent hugepage (THP) support:
> [0.005s][info][pagesize] THP mode: always
> [0.005s][info][pagesize] THP pagesize: 0B
>
>
> If we update `THPSupport::scan_os()` to call `scan_default_hugepagesize()` if reading `hpage_pmd_size` fails we should be good. We can add a check there to limit the "default size" to 16M (which sounds reasonable) and maybe log that we did this limiting. I'm not certain what we should do if reading the default hugepage size fail (returns 0), but I'm leaning towards having 2M as a general fallback. It will only be used if THP mode is configured to `madvide` or `always` and in those cases we likely have at least a 2M page size (and I'm not sure this should ever happen). We could also add code to `validate_thps_configured()` to return false if the thp page size is 0.
>
> An alternative to how we log this is to add a tag to the "THP pagesize" line, something like:
>
> [0.005s][info][pagesize] THP pagesize: 0B (configured/default/limited/fallback)
>
> But I'm not sure how useful that is compared to have an additional log-line for the limited and fallback case.
>
> If we go with this approach we should also update the testing code to do the same dance.
>
> What do you think?
Hi @kstefanj ,
> Thanks for the ping @tstuefe =)
>
> I think the approach to use the default large page size is good, but I think we should try to add the logic to `THPSupport::scan_os()` so it returns with a reasonable page size set. This will make the logging show the correct page size as well. If I'm not mistaken, right now the log will still render 0 as the THP page size:
>
> ```
> [0.005s][info][pagesize] Transparent hugepage (THP) support:
> [0.005s][info][pagesize] THP mode: always
> [0.005s][info][pagesize] THP pagesize: 0B
> ```
>
I'm unsure about this. I originally wanted the stuff in huge pages.cpp/hpp to not "lie" - as in not making assumptions. It was supposed to truthfully reflect the system's settings without making assumptions. So, if information is missing, print 0. The assumption-making should happen in the code above, in os_linux.cpp.
That strict separation was born out of working with the code before, which had been a big ball of tangled yarn with respect to who calls what and who lies about what. But maybe I am too religious about it now. But whatever we decide, I would like the information that we *assume* a THP size instead of just reading it from the OS to be reflected somewhere in the log and printed settings.
> If we update `THPSupport::scan_os()` to call `scan_default_hugepagesize()` if reading `hpage_pmd_size` fails we should be good. We can add a check there to limit the "default size" to 16M (which sounds reasonable) and maybe log that we did this limiting. I'm not certain what we should do if reading the default hugepage size fail (returns 0), but I'm leaning towards having 2M as a general fallback. It will only be used if THP mode is configured to `madvide` or `always` and in those cases we likely have at least a 2M page size (and I'm not sure this should ever happen). We could also add code to `validate_thps_configured()` to return false if the thp page size is 0.
Sure, we can do that. Though I believe THPs are implemented atop of explicit huge pages, so I would assume them to be always available.
>
> An alternative to how we log this is to add a tag to the "THP pagesize" line, something like:
>
> ```
> [0.005s][info][pagesize] THP pagesize: 0B (configured/default/limited/fallback)
> ```
>
> But I'm not sure how useful that is compared to have an additional log-line for the limited and fallback case.
>
> If we go with this approach we should also update the testing code to do the same dance.
>
> What do you think?
See above. I can do both. If we move THP pagesize recognition, including assumptions, down into hugepages.cpp (that was @zzambers first attempt too), I would like the information that we made an assumption reflected in the printout.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17545#issuecomment-1928927634
More information about the hotspot-runtime-dev
mailing list