RFR: 8280056: gtest/LargePageGtests.java#use-large-pages failed "os.release_one_mapping_multi_commits_vm" [v4]

Gerard Ziemski gziemski at openjdk.org
Mon Nov 20 17:02:23 UTC 2023


On Fri, 17 Nov 2023 16:15:19 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> There is no guarantee that the attempt will always be successful. By 'attempt', we mean that failing to reserve is also expected.
>
> Yes, something does not match up.
> 
> The original test did this:
> 
> 1) reserve a stripy region
> 2) release the stripy region
> 3) re-reserve in the same area of the stripy region. If the re-reserve worked, that was a proof that the release worked too, because we have now an address hole.
> 
> The problem with the original test was that (3) could fail because some other concurrent mmap got placed into that just released address space hole. That would have been okay, but we cannot tell apart: A) the release in step (2) did fail and we actually still have the original reservation in that address hole, which would be an error. Or, B) the release in step (2) did succeed, and some concurrent thread grabbed the hole now, which would probably be okay.
> 
> There are two ways to fix this:
> 
> - we either remove the (3) re-reservation. This means essentially giving up, which is okay in my eyes considering how much time this did cost already.
> - my proposal was to place border regions at the ends of the stripy region, before releasing it:
> 
> 1) reserve a stripy region
> 1.5) place border region before and after region
> 2) release the stripy region. Now the address hole is limited at both sides, which drastically reduces the chance of concurrent allocations getting into that hole. Especially since you did reduce the stripe size, which is good.
> 3) re-reserve in the same area of the stripy region. That should work, hopefully, so often that we can consider the test to be safe.
> 
> You did both - placed a border region (only one), but also removed the re-reservation. If you remove the re-reservation, you don't have to do the border region.

Looking at this I am starting to wonder whether we really need this test at all.

We seem to be testing whether the underlaying OS does what we want it, as opposed to our VM code, which in this case is just a thin convenience layer.

I can see this test having been useful when we were first implementing/using these features, but what would we need to get wrong nowadays to cause a regression in this test for example, short of someone doing a complete rewrite and getting something wrong?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16240#discussion_r1399496933


More information about the hotspot-runtime-dev mailing list