RFR: 8330181: Move PcDesc cache from nmethod header [v2]

Thu Apr 25 21:23:41 UTC 2024

On Wed, 24 Apr 2024 15:33:42 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header.
>> 
>> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`.
>> 
>> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days.
>> 
>> Tested tier1-4,stress,xcomp and performance.
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unneeded ThreadWXEnable

Reviewed.  Good work getting mutable data out of code space.  Let’s keep chipping away at it, so someday the code cache is a cache filled mostly with … code.

Some post-review musings follow…

I wrote some of that stuff 1/4 century ago; some of the math-geekery is mine.  Those data structures and algorithms could probably use a fresh look at some point.  Today I would do it differently than the me back then did it.  My fingers itch slightly as I look at that code.

As a top level goal, I hope some day soon we will get all the metadata out of code space, both mutable (as in this case) and immutable.  By immutable metadata I mean all the oddly encoded non-code sections in the nmethod layout  In a world with Leyden, it is best to put immutable metadata in a read-only memory-mapped part of the CDS archive, rather than in either the malloc heap or inside the nmethod itself.

There is an interesting question:  Why did we mix metadata into nmethod blocks in the first place?

Answer: There are two reasons, IIRC.

First, putting everything into one block in the code cache, although ugly, minimizes the number of storage allocation transactions associated with that block.  If we side-allocate stuff in metaspace or malloc heap, we have more moving parts to worry about.  In the early days of HotSpot we were just learning how to write concurrent code, and having a concurrent insert and delete in the code cache that would also correctly insert and delete side data seemed uncomfortably complex.  (At least, that is my memory.)

Second, back in the day, we didn’t really trust malloc to do jobs like this.  Not all implementations of malloc were performant, nor were they all multithread safe.  (…Hey there, Solaris!)  This also pushed us towards using our own stuff.  Nowadays I think we are more willing to reach for malloc.  (But malloc still does not fully integrate with HotSpot’s Native Memory Tracking, so that might be an issue.)

If I were redesigning this now, I’d rigorously separate three kinds of storage:  code, mutable memory (caches or link state), and immutable memory (debug info, PC descs, dependencies, etc.).  Those items would be linked by pointers, not put in adjacent memory blocks as today.  Over time CDS would learn how to adroitly manage the different components (code, mutable, immutable).

As a further investment, I’d replace ad hoc compressed data (which is always hard to maintain) with uniformly compressed data, using Unsigned5 (from Pack200); that compressed format, rather than a better, fancier one, because it decompresses in registers at memory speeds.  That is how about half of the immutable streams in HotSpot are already compressed, and there’s no reason I know of(*) to do it another way.

For an example of such investment, see https://github.com/rose00/jdk/tree/compress-zeroes which addresses BellSoft’s observation that the compressed debug-info streams (camping out in huge swathes of code cache) could be compressed better.  The PC desc mechanism requires random access, which might seem to require fixed-stride arrays (as today) but that is easily addressed by any one of several indexing tactics.  Today everybody knows you can do random access into compressed streams of data, with a little extra care.

(*Except for things like JAR files, where better compression is desirable, at the cost of slower decompression.  Even then, as Pack200 taught us, a first pass with a fast/cheap compressor often synergizes with an optional post-pass of something really nice like zstd or deflate.)

-------------

Marked as reviewed by jrose (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18895#pullrequestreview-2023615600