RFR: 8290025: Remove the Sweeper [v8]

Thu Aug 11 08:31:37 UTC 2022

On Thu, 11 Aug 2022 06:23:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I ran this patch through our CI. We now see Codecache OOMs in internal tests.
> 
> These tests were explicitly started with small code caches to see regressions in code cache usage. Unfortunately, these tests are internal, so I cannot share them.
> 
> Do tests in the open jtreg suite exist that test for code cache size regressions?

Thank you for taking the patch for a spin in your internal testing. Unfortunately, I'm not entirely sure what to do with this information as I can't reproduce it locally, and hence reason whether the test or the new code should be changed.
However, it's worth mentioning that there is a minimum number of 2 GCs for nmethods to not be called before they are unloaded due to being cold (cf. CodeCache::update_cold_gc_count). Having just 1 single GC felt like a big risk because then the entire code cache could easily get unloaded, and I don't think I want that to ever happen. It might be that removing that restriction would make your test pass. But if it did, it's far from obvious that doing that would be the right thing. Let me explain why.

CodeCache exhaustion is something we can recover from. So the implication of hitting it is that you have to run some things in the interpreter for a while and calm down a bit until code cache can get freed. But aggressively nuking the code cache to make room for JIT compiled code, by reaping very recently used compiled code, also entails that we similarly have to run in the interpreter a *lot*, and on top of that also have all the JIT compilers spin recompiling things that were recently used. While doing that might (again, can't really try my hypothesis) make the test pass, I'm really not sure it's the right thing to do. I get a feeling that if you are this close to exhaustion, that you really should have a larger code cache.

Could you share the -Xlog:codecache log information, so that we can see if the heuristics gave it its best shot, or if it was slacking, which would at least be a bug?

-------------

PR: https://git.openjdk.org/jdk/pull/9741