RFR: JDK-8293114: GC should trim the native heap [v9]

Wed Feb 8 11:05:44 UTC 2023

On Wed, 8 Feb 2023 10:10:47 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> > I will repeat the tests. What benchmark would you recommend? I also plan to do a regular benchmark but with an agent lib injected that does concurrent malloc spikes, to mimic third party code doing mallocs on the side.
> 
> Internally we have seem workloads that involves classloading as a builder of RSS. (thought to be Symbols (we are looking into that now)) But I don't know what, and if there is such, benchmark that do classloading during the "performance cycle".

I have mass classloading tests for JEP387 around. I can test them.

That ties in to another question I wondered about recently, which is why so many sub structures of MetaspaceObj are C-heap allocated. Ideally, one would not need something like the various release_C_heap_structures() - since the lifetime of the sub structures is identical to its Parent in Metaspace, e.g. InstanceKlass, they could live in Metaspace too and not need explicit deallocation. I understand that you need explicit release for Symbol etc.

> 
> Another observation is that if we get unlucky, it seem like we could be trimming (NativeTrimmerThread) while cleaning e.g. stringtable (via ServiceThread). Thus calling free a lot, not sure if that is problematic. Another case is if user only have very few logical cores, if ServiceThread+NativeTrimmerThread running at the same time user may experience some additional hick-ups.

Interesting. One possibility would be use os::malloc/free calls, e.g. reset the timer. I did not do this because hotspot mallocs/frees are normally not that hot; I expected more malloc traffic from outside JNI libraries in our process, and those I cannot control or monitor anyway, so why bother.

Note that if allocations are done via Arenas (RA or CompileArenas), they cluster allocations and have an inbuilt free delay, so they already filter high malloc/free jitter. That is why I disregarded the Compiler for the moment instead of pausing the trimmer thread during compilation.

Before making it more complex I'd like to do more benchmarks though. Maybe its not even needed. After all, its an experimental opt-in switch for now and can be refined later. Let's see.

> 
> So we maybe want to use ServiceThread at the cost of sometimes delaying the trimming due to other work?

The problem with this - why I did not integrate it directly into e.g. the Shenandoah Control Thread - is that trim time cannot be predicted and trimming cannot be interrupted once it goes. Its a surprise and depends on the retention size and how fast the trim can proceed. Note that concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena.

With GC-induced java heap uncommitting, it is different. We know much more. E.g., G1 uncommits piece by piece and can interrupt this process if time's up.

So, using ServiceThread would be fine only if prolonged blocking is acceptable.

-------------

PR: https://git.openjdk.org/jdk/pull/10085