RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use

Thomas Stuefe stuefe at openjdk.org
Wed Mar 12 07:02:22 UTC 2025


On Wed, 5 Mar 2025 06:34:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174.
> 
> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B). 
> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A).
> 
> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 .
> 
> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint.
> 
> Tests:
> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths
> - SAP reports all tests green (they had reported errors with the previous version)
> - Oracle Tests ongoing
> - GHAs green

Both SAP and Oracle report no errors. Oracle tests tripped over the missing COH archives in Oracle builds; I amended to test to skip in case COH archives are missing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2716761388


More information about the hotspot-dev mailing list