RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v2]
Ioi Lam
iklam at openjdk.org
Fri Mar 14 06:46:57 UTC 2025
On Wed, 12 Mar 2025 07:02:22 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174.
>>
>> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B).
>> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A).
>>
>> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 .
>>
>> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint.
>>
>> Tests:
>> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths
>> - SAP reports all tests green (they had reported errors with the previous version)
>> - Oracle Tests ongoing
>> - GHAs green
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> skip test if we have no COH archive
LGTM
-------------
Marked as reviewed by iklam (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/23912#pullrequestreview-2684475218
More information about the hotspot-dev
mailing list