RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3]
Matthias Baesken
mbaesken at openjdk.org
Thu Mar 27 13:51:17 UTC 2025
On Fri, 14 Mar 2025 09:20:40 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174.
>>
>> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B).
>> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A).
>>
>> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 .
>>
>> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint.
>>
>> Tests:
>> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths
>> - SAP reports all tests green (they had reported errors with the previous version)
>> - Oracle Tests ongoing
>> - GHAs green
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
>
> - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use
> - skip test if we have no COH archive
> - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use
> - aix fix
> - test and aix exclusion
> - Fix windows when ArchiveRelocationMode=0 or 2
> - original
src/hotspot/share/cds/metaspaceShared.cpp line 1342:
> 1340: archive_space_rs = {};
> 1341: // The protection zone is part of the archive:
> 1342: // See comment above, the windows way of loading CDS is to mmap the individual
If it is about MS Windows, better write 'Windows' not windows
src/hotspot/share/cds/metaspaceShared.cpp line 1343:
> 1341: // The protection zone is part of the archive:
> 1342: // See comment above, the windows way of loading CDS is to mmap the individual
> 1343: // parts of the archive into the adress region we just vacated. The protection
Typo? adress -> address
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016695632
PR Review Comment: https://git.openjdk.org/jdk/pull/23912#discussion_r2016682159
More information about the hotspot-dev
mailing list