Unconditional messages on large page reservation errors

Sat Apr 24 12:31:32 UTC 2021

Hi Thomas,

Sorry for the late reply.

On 2021-04-17 06:11, Thomas Stüfe wrote:
> Hi all,
> 
> In os::reserve_memory_special, we print unconditional warnings to stderr in
> case large page reservation fails. Unconditional printouts like these can
> interfere with parsers parsing VM output, and can accumulate and cause high
> memory footprint (see e.g. https://bugs.openjdk.java.net/browse/JDK-8265332
> ).

I see unconditional warnings only for shm-case, in the hugetlbfs-case we 
use warn_on_commit_special_failure() which only print the warning if 
large pages was explicitly requested. Still we can only end up here if 
large pages are enabled and that needs to be done explicitly so the 
warnings are kind of unconditional.

> 
> Large page reservations may fail at any time os::reserve_memory_special()
> function is called, e.g. because the large page pool is temporarily
> exhausted. And os::reserve_memory_special() is a general purpose function,
> not only used for the heap. Running out of large pages is not fatal, since
> the caller can just fall back to normal page allocation. Which is what we
> do when reserving the java heap. I think unconditional printouts should
> only happen in case of fatal errors, when the VM is about to die.
> 

I don't really agree here since the performance implications of not 
using large pages are quite big. I think it is fair to issue the warning 
since in most cases it signals that there is an environment 
configuration problem. For testing we have the possibility to use 
-XX:-PrintWarnings and I saw you used that to fix the issue mentioned above.

Also, what would the use-case for warning() be if not for printing 
information in non-fatal but problematic situations (which I think this is).

> The unconditional warning probably made sense in the context of reserving
> the java heap, if the user explicitly specified UseLargePages. I propose to
> change this to either
> - if large page allocation for the heap fails, trace with info level
> and fall back to small pages. Leave it up to the user to increase UL and
> monitor log output to find out about this. This is what we usually do when
> system APIs fail.
> - continue printing the message with error level, but exit the VM. If
> it's serious enough to unconditionally notify the user, it's serious enough
> to stop the VM.
> 
> I prefer the former. What do you think?

As I said above, I see a value in warning if you don't get what you are 
requesting. But I know others that think exiting is a better strategy. 
If I'm not mistaken ZGC won't start if it can't reserve enough large 
pages. One thing that I would like to change in this are is for these 
warnings to be converted to use UL. That way we could turn of warnings 
on a much finer granularity and we wouldn't have to use jio_snprintf to 
compose messages.

Thanks for bringing this up for discussion,
StefanJ

> 
> Thanks and best Regards,
> 
> Thomas
>