RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use [v3]

Thomas Stuefe stuefe at openjdk.org
Fri Mar 21 10:06:17 UTC 2025


On Fri, 14 Mar 2025 09:20:40 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174.
>> 
>> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). 
>> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A).
>> 
>> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 .
>> 
>> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint.
>> 
>> Tests:
>> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths
>> - SAP reports all tests green (they had reported errors with the previous version)
>> - Oracle Tests ongoing
>> - GHAs green
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use
>  - skip test if we have no COH archive
>  - Merge branch 'openjdk:master' into JDK-8351040-REDO-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use
>  - aix fix
>  - test and aix exclusion
>  - Fix windows when ArchiveRelocationMode=0 or 2
>  - original

friendly ping

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2742891097


More information about the hotspot-dev mailing list