RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v5]

Thu Feb 13 16:10:18 UTC 2025

On Thu, 23 Jan 2025 06:59:09 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix whitespace error
>
> Changes requested by iklam (Reviewer).

@iklam 

This is surprisingly complex.

So, let's say I use _mapping_offset as suggested. With that solution, we would set the archive requested base to 0x800000000, the first region mapped at e.g., 0x801000000 (assuming a 16MB protzone size for a moment), and so on. 

The problems I ran into were caused by all the file global data for which offsets were precalculated in the header. These offsets refer to the start of the mapped archive base, and there are implicit assumptions that the mapped archive base equals the start of the dump time staging buffer. However, since the staging buffer should not contain the protzone (we don't want to save that), this assumption is incorrect. The offsets calculated at dump time must, at runtime, refer to the start of the mapping, which will include the protection zone, while at dump time, they refer to the start of the staging buffer, which does not include the protection zone.

Whichever way one tries to solve this, one gets domino effects. If I try to fix `ArchiveBuilder::any_to_addr` to make the offsets match up at runtime, they won't match at dump time to the correct position in the staging area and I get dump time asserts. If I let the staging buffer include the protection zone at dump time but then just don't write that part (so start writing after the end of the protection zone in the staging buffer), the file offsets at runtime won't match. And so on, tried a couple of different ways and ultimately just gave up.

Therefore, I settled for an "easier" less invasive way, which is to precede the mapped archive space with the protection zone. In this proposal, the requested base equals the the archive base as before, so the point where the first region gets mapped and to which the global offsets refer too. However, that mapping is preceded by the protection zone, and we must make now sure to initialize narrow klass decoding with the start of the protection zone as the encoding base, not the start of the mapping archive. We also must ensure that the encoding base address is still optimized for narrow Klass encoding (aligned correctly etc). And we must precalculate at dump time the narrow Klass IDs with the future protection zone address, not the future mapping start, as the encoding base. Still, this was easier, since outside of precalculating narrow Klass IDs, nothing in CDS relies really on the encoding base.

The annoying part now is that "SharedBaseAddress" changes its meaning. Before, it meant "start of mapping aka encoding base". Now it means "start of mapping" but encoding base is SharedBaseAddress - protzone size. 

So, not sure. 

You can find the current "SharedBaseAddress points to after the protection zone" approach (a) here:
https://github.com/tstuefe/jdk/tree/nklass-protzone-take3

In case you are interested, you can find the first "_mapping_offset" approach, aborted (b), here:
https://github.com/tstuefe/jdk/tree/nklass-protzone-take2

So, compared to my original patch, is (a) the "right way"? Not sure. Yes, we don't pay 16KB of additional memory for the jsa file. However, one could argue that a protzone filled with zeros would just compress to nothing anyway, so I am not sure how much we really gain. In any case, I am unsure which course to follow here. I would hate it, however, if we just abandoned this work, since I have been bitten by nKlass=0 dereferences one too many times in the past.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2657076306