RFR: 8351040: [REDO] Protection zone for easier detection of accidental zero-nKlass use
Joachim Kern
jkern at openjdk.org
Fri Mar 21 12:07:14 UTC 2025
On Wed, 5 Mar 2025 11:43:15 GMT, Joachim Kern <jkern at openjdk.org> wrote:
>> Please consider this second attempt at fixing https://bugs.openjdk.org/browse/JDK-8330174.
>>
>> JDK-8330174 broke Windows and AIX (see breakage issue, https://bugs.openjdk.org/browse/JDK-8350768). The Windows issue happened in `MetaspaceShared::map_archives` for ArchiveRelocationMode=0 or ArchiveRelocationMode=2 (use_requested_addr=true). In those cases, we (A) delete the initial combined mapping for the CDS archive and then (B) mmap the individual archive regions separately into their respective, now vacated, address spaces. The protection zone is also part of the combined CDS archive mapping, so it gets released at (A). Since the protection zone is not part of the archive, it is not reinstated like the other regions at step (B).
>> Happily, that caused the canary assertion whose purpose was to catch such errors to segfault, so we noticed. Without assert, since the mapping is released, the OS may at some later time put another mapping into that region. So we have to make sure the mapping for the protection zone gets re-reserved after being released at (A).
>>
>> The fix for the windows error is in commit https://github.com/openjdk/jdk/pull/23912/commits/504931d745d483edc8662e51f7bb3c321ceac9a3 .
>>
>> The AIX error, in comparison, is easy. On AIX we cannot mprotect System V shared memory (or better, we cannot mprotect 64K pages, @JoKern65 or @TheRealMDoerr ?). Using 64K pages for such frequently accessed memory as CDS and class space is more beneficial than protecting the zero nklass page. As a fallback, on AIX, we still leave the page, but we fill it with a marker value ('P', 0x50). Now, if you accidentally dereference a zero nKlass, you will not crash immediately. But at least later crashes will probably contain register values like '0x5050505050505050', so it is a hint.
>>
>> Tests:
>> - Local tests on Linux x64, Mac aarch64, Windows x64, (simulated) AIX paths
>> - SAP reports all tests green (they had reported errors with the previous version)
>> - Oracle Tests ongoing
>> - GHAs green
>
> Hi Thomas,
> mprotect supports System V shared memory, but only if running in an environment where the MPROTECT_SHM=ON environmental variable is defined, which is not the case in the jdk. So we can fairly say System V shared memory cannot be mprotected by us.
>
> The documentation says:
> _The mprotect subroutine can only be used on shared memory regions backed with 4 KB or 64 KB pages;_
> So we can mprotect 64K pages and mmap supports 64K pages beginning with AIX 7.3 TL1.
> With JDK-8334371 we favor the use of mmap 64K pages over System V shared memory if running on a system with AIX 7.3 TL1 or higher. But as long as we allow lower os versions the system V shared memory is still in place, and the mprotect restriction stays valid.
> I haven't seen test errors with this new version. @JoKern65, @MBaesken: Are you aware of any problems?
No, I'm not aware of any problems.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23912#issuecomment-2743175041
More information about the hotspot-dev
mailing list