RFR: 8280056: gtest/LargePageGtests.java#use-large-pages failed "os.release_one_mapping_multi_commits_vm" [v2]

Fri Oct 20 08:32:36 UTC 2023

On Wed, 18 Oct 2023 14:58:15 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

>> The `attempt_allocate_memory_at(addr, size)` may fail for some reasons that `::mmap`  will report. To find out why an attempt failed, an `int *` is passed down deep to the `::mmap` to return the `ERRNO` back to the callers.
>> This error code is used in gTest test case to show a proper message.
>> 
>> The changed test case never failed after 10K+ repetitions, but in case it happens in future the reason is also printed out.
>
> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision:
> 
>   some pd_xxx calls were missed.

> In my previous comment, I meant undoing all my changes and just change the gtest such that returning `nullptr` is an EXPECTED and also ACCEPTED case. It is not guaranteed that _any attempt to allocate memory at a requested address always returns a valid address that is the same as the requested one_.

`os::attempt_reserve_at` always returns either memory at the requested address or null, that is guaranteed by its contract.

> The **attempt** part in the method's name explicitly says so. My suggestion is only one change like this `if (returned_address != nullptr) ASSERT_EQ(returned_address, requested_address);`
> 

That is not necessary. Or in other words, it should always be the case unless `os::attempt_reserve_at` is broken, and if it is, it is not this test's business.

By changing the error condition to one that should always be true you effectively disable the second portion of the test, this would be the third point I mentioned as "Finally, just remove the re-reserve part of the test and give up." That way we will never see problems from this test anymore, but won't know if multi release works or not. 

Its a valid route to go I guess if we feel like giving up.

> If we know why this case may happen (concurrent same requests or ...), or don't care why it happened, then there would be no need to return `ERRNO` from `::mmap` the long way up to the gtest and/or logging it. Improving the test or replacing it can be considered as other long term choices.

I think the point is: we want it to generally succeed but once in a blue moon it may fail for valid reasons. But we would like to know if it always fails.

Listen, one simple possibility would be just to move the whole of the test into one sub function, and re-attempt the test ~20 times, each time counting the failed re-reserve not as an ASSERT but as an error. Because what should happen if someone grabbed the memory concurrently, the next test execution would then repeat the experiment at a different address. If all 20 attempts fail, that would be a reason to worry.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16240#issuecomment-1772306640