RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Wed Feb 10 14:44:40 UTC 2021

On Tue, 9 Feb 2021 20:50:25 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
> 
> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
> 
> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. 
> 
> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
> 
> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
> 
> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

Hi Stefan,

Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.

About your patch: The shmget test is good, I am not sure about the rest. 

You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:

       EPERM  The SHM_HUGETLB flag was specified, but the caller was not
              privileged (did not have the CAP_IPC_LOCK capability).

So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().

About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.

I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.

Cheers, Thomas

src/hotspot/os/linux/os_linux.cpp line 3569:

> 3567: // The capability needed to lock memory CAP_IPC_LOCK
> 3568: #define CAP_IPC_LOCK      14
> 3569: #define CAP_IPC_LOCK_BIT  (1 << CAP_IPC_LOCK)

Can we not just include linux/capabilities.h ?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488