RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved

Stefan Karlsson stefank at openjdk.org
Wed Apr 3 10:17:10 UTC 2024


On Wed, 3 Apr 2024 08:12:22 GMT, Liming Liu <duke at openjdk.org> wrote:

> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14.

I have looked at this a bit and have a few comments.

src/hotspot/os/linux/os_linux.cpp line 3066:

> 3064:                         p2i(first), len, MADV_POPULATE_WRITE, os::strerror(err),
> 3065:                         err, (int)(HugePages::thp_mode() == THPMode::always),
> 3066:                         (int)UseTransparentHugePages);

This function has become quite messy: It checks UseMadvPopulateWrite tree times. It mutates the input variable. It logs even when no failure occurred (This might be a bug).

I'd much prefer if we could straighten out the code into something like this (I've not tried to compile this):

size_t os::pd_pretouch_memory(void* first, void* last, size_t page_size) {
  if (HugePages::thp_mode() != THPMode::always && !UseTransparentHugePages) {
    // No THP. Use the platform-independent pretouch memory code.
    return page_size;
  }

  if (!UseMadvPopulateWrite) {
    // Use small pages with the platform-independent pretouch memory code.
    // When using THP we need to always pre-touch using small pages as the
    // OS will initially always use small pages.
    return os::vm_page_size();
  }

  const size_t len = pointer_delta(last, first, sizeof(char)) + page_size;
  if (::madvise(first, len, MADV_POPULATE_WRITE) != -1) {
    // Succeeded
    // 0 signals that the platform-independent pretouch memory code should not run.
    return 0;
  }

  // madvise failed
  int err = errno;
  
  log_debug(gc, os)("Called madvise(" PTR_FORMAT ", " SIZE_FORMAT ", %d):"
                    " error='%s' (errno=%d), when THPMode::always=%d and"
                    " UseTransparentHugePages=%d",
                    p2i(first), len, MADV_POPULATE_WRITE, os::strerror(err),
                    err, (int)(HugePages::thp_mode() == THPMode::always),
                    (int)UseTransparentHugePages);

  if (err == EINVAL) {
    // Not supported. When using THP we need to always pre-touch using
    // small pages as the OS will initially always use small pages.
    return os::vm_page_size();
  }

  assert(false, "Unexpected  error='%s' (errno=%d)",os::strerror(err), err);
  return page_size;
}


It is unclear to me why a failure with EINVAL should return small pages but other failures should return another page size. Why would we get EINVAL if we already performed the correct checks when setting up UseMadvPopulateWrite? Or is the problem that we skip performing the necessary checks if the user forcefully sets UseMadvPopulateWrite on the command line? Maybe we shouldn't allow that?

src/hotspot/os/linux/os_linux.cpp line 4547:

> 4545: 
> 4546:   check_pax();
> 4547:   Linux::version_init();

check_pax() and Linux::version_init() are unrelated things. Please add a blankline between them.

src/hotspot/os/linux/os_linux.cpp line 4826:

> 4824: 
> 4825:   // Check the availability of MADV_POPULATE_WRITE.
> 4826:   if (FLAG_IS_DEFAULT(UseMadvPopulateWrite) && UseMadvPopulateWrite) {

Should we really allow the user to forcefully set this and trigger a path that is not going to be correct on old kernels?

src/hotspot/os/linux/os_linux.cpp line 4837:

> 4835:     }
> 4836:   }
> 4837:   log_debug(gc, os)("UseMadvPopulateWrite=%d", (int)(UseMadvPopulateWrite ? 1 : 0));

Is this temporary debugging code? We have -XX:+PrintFlagsFinal to check the resulting flag values.

-------------

Changes requested by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18592#pullrequestreview-1976268929
PR Review Comment: https://git.openjdk.org/jdk/pull/18592#discussion_r1549407678
PR Review Comment: https://git.openjdk.org/jdk/pull/18592#discussion_r1549413960
PR Review Comment: https://git.openjdk.org/jdk/pull/18592#discussion_r1549416240
PR Review Comment: https://git.openjdk.org/jdk/pull/18592#discussion_r1549412980


More information about the hotspot-runtime-dev mailing list