RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Wed Feb 10 17:53:36 UTC 2021

On Wed, 10 Feb 2021 14:41:23 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Hi Stefan,
> 
> Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.
> 

Nice to hear that you agree Thomas :)

> About your patch: The shmget test is good, I am not sure about the rest.
> 
> You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:
> 
> ```
>        EPERM  The SHM_HUGETLB flag was specified, but the caller was not
>               privileged (did not have the CAP_IPC_LOCK capability).
> ```
> 
> So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().
> 
This was my initial though as well, but it turns out that the capability is only needed to "lock" more memory than the MEMLOCK-limit defines. So the first `shmget()` will succeed as long as there is at least one large page available (and the MEMLOCK-limit is not smaller than the large page size).

I could of course design the other check to once again call `shmget()`, but with a size larger than the limit. If that check fails with `EPERM` we don't have the needed capability. 

> About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.
> 
In old kernels this was the case but according to `man setrlimit` this is no longer the case:
RLIMIT_MEMLOCK
  The maximum number of bytes of memory that may be locked into RAM ...
  In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be
  locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of 
  memory that a privileged process may lock, and this limit instead governs the amount 
  of memory that an unprivileged process may lock. 

So if we have the capability the limit will never be a limiting factor. 

> I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.
> 

I agree that this warning is not covering all bases, but I though it was a good addition compared to just be silent  about it. But I like your idea about just doing the simplest sanity check here and then when a mapping goes bade try to figure out why it did.

In that case I would only include the first `shmget()` in the sanity check and file a separate issue for improving the warning.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488