RFR: 8232365: Implementation for JEP 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector

Leo Korinth leo.korinth at oracle.com
Tue Oct 22 15:17:09 UTC 2019


On 21/10/2019 23:48, Kim Barrett wrote:
>> On Oct 18, 2019, at 4:20 AM, Leo Korinth <leo.korinth at oracle.com> wrote:
>>
>> Hi,
>>
>> Here is a patch that removes the CMS GC.
>>
>> I have neither tested arm nor ppc; I hope my changes to those .ad files are correct, if someone can test those architectures, that would be great.
>>
>> Please take an extra look at CollectedHeap::check_for_non_bad_heap_word_value, it was buggy before (but never called), It is now called (and hopefully correct).
>>
>> I have tried to remove most parts of CMS. I have not made it a goal to remove all traces of CMS. I guess there are much more to cleanup, and suggestions of more to remove are welcomed. I think more complicated cleanups should be dealt with in separate enhancements.
>>
>> Not fully addressed in code, but an issue that has to be dealt with, how do I obsolete -Xconcgc and -Xnoconcgc? I believe the option should be obsoleted, though I do not know if we have any precedence obsoleting -X options.
>>
>> My patch prints:
>>
>> $ java -Xconcgc -jar Notepad.jar
>> Java HotSpot(TM) 64-Bit Server VM warning: -Xconcgc uses UseConcMarkSweepGC
>>
>> I guess that is not enough for being obsolete, compare with:
>>
>> $ java -XX:UseConcMarkSweepGC -jar Notepad.jar
>> Java HotSpot(TM) 64-Bit Server VM warning: Ignoring option UseConcMarkSweepGC; support was removed in 14.0
>>
>> Bug:
>>   https://bugs.openjdk.java.net/browse/JDK-8232365
>>
>> Webrev:
>>   http://cr.openjdk.java.net/~lkorinth/8232365/00
>>
>> Testing:
>>   tier 1-5.
>>
>> Thanks,
>> Leo
> 
> Looks pretty good.
> 
> A fair number of comments below, but no major problems, just tidying
> things up.
> 
> I'm sure there are more lingering CMS-isms, but we can clean those up
> in followups. (After all, there are still lingering permgen bits here
> and there, at least in comments.)
> 
> Regarding obsoleting -X options, you might look at what was done for
> -Xincgc (JEP 214 - JDK-8044022, JDK-8072921).

Thanks.

> Regarding CollectedHeap::check_for_non_bad_heap_word_value, bleh!  The
> first couple of comments below are about it.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/collectedHeap.cpp
> 336 void CollectedHeap::check_for_non_bad_heap_word_value(HeapWord* addr, size_t size) {
> ...
> 338     // please not mismatch between size (in 32/64 bit words), and ju_addr that always point to a 32 bit word
> 
> s/not/note/
> 
> Also, the assert on 340 looks indented 3 rather than 2.

Will fix both.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/collectedHeap.cpp
> 336 void CollectedHeap::check_for_non_bad_heap_word_value(HeapWord* addr, size_t size) {
> 
>> Please take an extra look at CollectedHeap::check_for_non_bad_heap_word_value,
>> it was buggy before (but never called), It is now called (and hopefully
>> correct).
> 
> I don't understand "but never called" as it looks like it has been
> called all along from MemAllocator::allocate_outside_tlab. Except
> that's only when UseTLAB is false, and it always defaults true if any
> of C1, C2, or JVMCI are included. We do have some -UseTLAB tests
> though, and they aren't all with CMS, so I'm not sure how a problem
> here hasn't already appeared. >
> Because I don't understand "but never called" I'm failing to
> understand how this hasn't shown up as a problem before. Seems like it
> should never have worked and there would be test failures at least.
> 

Sorry, I was sloppy in my description, it has not been called from our 
/test suite/. Special command line option (CheckMemoryInitialization && 
ZapUnusedHeapArea) needs to be given to the JVM for the check to be 
done. -XX:+CheckMemoryInitialization is only used in one test that hard 
codes serial gc. That -XX:+CheckMemoryInitialization flag looks quite 
unused might be a problem in itself, I do not know.

> Part of the problem here is that Copy::fill_to_words takes a juint
> fill value that it duplicates in high/low on 64bit platforms.  That
> seems wrong; shouldn't it take a platform-word sized value and just
> store that?  But making any such change is going to require some care.
> 
> And it results in the CollectedHeap implementation's use of intptr_t
> on 64bit platforms comparing badHeapWord (a juint, so 32bits) with
> 64bit (((julong)badHeapWord << 32) | badHeapWord). So how did this
> ever manage to pass testing?
> 
> However, on a 64bit platform the new code (and the old
> GenCollectedHeap code) is only checking every other 32bit value. The
> step distance is 8 bytes but the value read at each step is only 4
> bytes.

The old code did not work, but what is wrong with the new code? 
++ju_addr increments /four/ bytes at a time (ju_addr is a pointer to 
uint32_t), and (if I am not mistaken) the full memory range is checked 
as the sum of reinterpret_cast<juint*>(addr + size) is done in words 
(not juints) and /then/ casted to juint*.

Or did I miss something?

> 
> There is also a (pre-existing) !PRODUCT vs ASSERT mismatch here. The
> checker is !PRODUCT, but the filling is debug-only (see
> SpaceMangler::mangle_region). So I think it's still broken in
> "optimized" builds. (It is periodically suggested that we get rid of
> that configuration (JDK-8183287, for example) but there's always been
> some folks who really want to keep it.)

I have filed a bug: https://bugs.openjdk.java.net/browse/JDK-8232792

> ------------------------------------------------------------------------------
> src/hotspot/cpu/arm/globals_arm.hpp
> 66 // GC Ergo Flags
> 
> I think that comment only applied to the now removed CMSYoungGenPerWorker.
> It now _appears_ to apply to TypeProfileLevel, which isn't a GC flag.
> Suggest just removing the comment.
> 
> Similarly in (all?) the other globals_<cpu>.hpp files:
> src/hotspot/cpu/aarch64/globals_aarch64.hpp
> src/hotspot/cpu/ppc/globals_ppc.hpp
> src/hotspot/cpu/s390/globals_s390.hpp
> src/hotspot/cpu/sparc/globals_sparc.hpp
> src/hotspot/cpu/x86/globals_x86.hpp
> src/hotspot/cpu/zero/globals_zero.hpp
> 

Will fix.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp
> 301  * in the nursery, this would happen for humongous objects. This is similar to
> 302  * how CMS was required to handle this case, see the comments for the method
> 303  * CollectedHeap::new_deferred_store_barrier and OptoRuntime::new_deferred_store_barrier.
> 
> I would prefer this didn't mention CMS at all anymore, e.g. drop the
> phase "This is similar to how CMS was required to handle this case".

Will fix.

> 
> But while exploring this I discovered that the functions being
> referred to here no longer exist, at least not by that name.  That is,
> there are no occurrences of "new_deferred_store_barrier" other than in
> some comments.  I haven't tried to figure out what's happened beyond
> noticing that...
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/parallel/psPromotionManager.hpp
> 105   // need to scan; this is basically how ParNew did partial array
> 106   // scanning too). To be able to distinguish between reference
> 
> I would prefer this didn't mention ParNew at all anymore, e.g. drop
> everything after the semicolon.

Will fix.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/blockOffsetTable.hpp
>   44 //     - BlockOffsetArrayNonContigSpace
> 
> Doesn't exist anymore.
> 

Will fix.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/cardTableBarrierSet.cpp
> 117 //     but, like in CMS, because of the presence of concurrent refinement
> 118 //     (much like CMS' precleaning), must strictly follow the oop-store.
> 119 //     Thus, using the same protocol for maintaining the intended
> 120 //     invariants turns out, serendepitously, to be the same for both
> 121 //     G1 and CMS.
> 
> I'd prefer this didn't mention CMS anymore, e.g.
> 
> s/, like in CMS,//
> s/(much like CMS' precleaning)//
> s/Thus, ...//

Will fix.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/jvmFlagConstraintsGC.cpp
>   68 // As ConcGCThreads should be smaller than ParallelGCThreads,
>   69 // we need constraint function.
>   70 JVMFlag::Error ConcGCThreadsConstraintFunc(uint value, bool verbose) {
>   71   // G1 GC use ConcGCThreads.
>   72   if (GCConfig::is_gc_selected(CollectedHeap::G1) && (value > ParallelGCThreads)) {
> 
> Other collectors (Shenandoah and Z) also use ConcGCThreads.  I don't
> know if ConcGCThreads is supposed to be not greater than
> ParallelGCThreads for those collectors.  It might be a bug that this
> code was checking for specific GCs here (formerly G1 and CMS, now just
> G1).  (I _think_ this code might pre-date both Shenandoah and Z, at
> least in open code.)
> 
> There are some other constraint functions that are checking for
> specific GCs that maybe ought not to be.
> 
> Looking into this and doing any further changes should be deferred to
> a separate RFE rather than trying to include in this CMS removal
> change.

I have filed: https://bugs.openjdk.java.net/browse/JDK-8232794
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/referenceProcessor.hpp
> 220   // of an oop. It is currently initialized to NULL for all
> 221   // collectors except for G1.
> 
> It might be better to leave out the last sentence.  Shenandoah uses
> _is_alive_non_header too, though initialized to NULL and manipulated
> where needed using ReferenceProcessorIsAliveMutator.

Will fix.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/markWord.hpp
> 130   static const int unused_gap_bits                = LP64_ONLY(1) NOT_LP64(0);
> ...
> 138   static const int unused_gap_shift               = age_shift + age_bits;
> 139   static const int hash_shift                     = unused_gap_shift + unused_gap_bits;
> 
> For this change, I completely agree with this renaming of CMS bits to
> "unused_gap".  Moving header bits around seems likely to have lots of
> "interesting" effects.  I assume there will be a followup RFE to clean
> this up?

I have filed: https://bugs.openjdk.java.net/browse/JDK-8232795

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/arguments.cpp
> 
> 73 product options bite the dust!

:-)

> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/arguments.cpp
> 4177     // Non NUMA-aware collectors such as G1 and Serial-GC on
> 
> FYI, there will be a minor merge conflict here between you and
> Sangheon's G1 NUMA changes.

OK

> 
> ------------------------------------------------------------------------------
> 
> I only skimmed the Serviceability Agent changes.  They look ok, but
> I'm not very familiar with that code, so don't count me as having
> reviewed that part.
> 
> ------------------------------------------------------------------------------
> src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java
> 
> 234         Parallel(true, "UseParallelGC", "UseParallelOldGC"),
> 
> You removed UseParNewGC from here.  Seems like that use was just
> wrong.  -XX:+UseParNewGC alone was ParNew + SerialOld (so nothing to
> do with "ParallelGC"), and that combination and usage was deprecated
> in JDK 8 (JEP 173) and removed in JDK 9 (JEP 214).  The UseParNewGC
> doesn't even exist anymore.
> 
> So good find!  Could have been a separate bug, but doesn't seem worth
> the effort now.

Agree that it is not worth the effort, it is also related to removal of 
ParNew in CMS (even though the "Use" flag was removed prior to this).

> ------------------------------------------------------------------------------
> test/hotspot/jtreg/runtime/7167069/PrintAsFlag.java
> 
> Removal of -XX:CMSSmallCoalSurplusPercent=1.05 leaves no floating
> point flags in the command, so that case is no longer covered.
> 
> ------------------------------------------------------------------------------
> test/hotspot/jtreg/runtime/testlibrary/ClassUnloadCommon.java
>   65         allocateMemory(16 * 1024); // yg size is 8m with cms[[keep?]], force young collection
> 
> This looks like a lingering TODO for this change.

Yes, I thought I forgot one, but could not find it, thanks!

> 
> ------------------------------------------------------------------------------
> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/general_functions/GF08/gf08t001/TestDriver.java
> 
> 48  *    is enabled and G1 is enbled. If ExplicitGCInvokesConcurrent and
> 49  *    G1 is enbled the test searches for 'GC' string in output.
> 
> [pre-existing, but since you are touching these lines anyway...]
> 
> s/enbled/enabled/ -- two occurrences

Will fix.

> 
> [and another pre-existing problem nearby]
> This line:
> 47  *    searched for 'Full GC' string, unless ExplicitGCInvokesConcurrent
> 
> doesn't match the code here:
> 86             keyPhrase = "Pause Full";

Will fix.

> 
> ------------------------------------------------------------------------------
> test/jdk/com/sun/management/HotSpotDiagnosticMXBean/CheckOrigin.java
> Removed:
>   28  * @comment Graal does not support CMS
>   29  * @requires !vm.graal.enabled
> 
> This @requires was disabling this test for all collectors when graal
> was enabled, not just for CMS. Has this been run with graal and all
> other GCs?

What I tried to do was replacing CMS with G1 in this test as both are 
setting MaxNewSize. G1 is working with graal, right? I have not run all 
test configurations, but tier 1-5 did work. If I understand, this test 
case will not run with any GC other than G1 as it is hard coded as an 
argument to the process builder.

> 
> ------------------------------------------------------------------------------
> test/jdk/com/sun/management/HotSpotDiagnosticMXBean/CheckOrigin.java
>   66                     "-XX:+UseConcMarkSweepGC",  // this will cause MaxNewSize to be FLAG_SET_ERGO
> =>
>   64                     "-XX:+UseConcG1GC",  // this will cause MaxNewSize to be FLAG_SET_ERGO
> 
> Seems like this should be using +UseG1GC.  (Not relying on this bogus
> option being ignored and defaulting to G1, and not having anything to
> do with the comment.)

Yes! Bad from my side, embarrassing, will fix. G1 sets the same 
MaxNewSize so the comment is still correct, and the flag is checked 
later in the test case: checkOrigin("MaxNewSize", Origin.ERGONOMIC);

I just removed "Conc" part that I failed to write over. Is that okay?

> 
> ------------------------------------------------------------------------------
> test/jdk/java/lang/management/MemoryMXBean/PendingAllGC.sh
> 
> Maybe this test should be updated to test with other collectors than
> just the default (now G1) and Parallel.  File an RFE?

Okay: https://bugs.openjdk.java.net/browse/JDK-8232798

> 
> ------------------------------------------------------------------------------
> test/jdk/jdk/jfr/event/gc/collection/GCEventAll.java
> 243         if (GCHelper.gcG1Old.equals(oldCollector)) {
> 244             // ConcurrentMarkSweep mixes old and new collections. Not same values as in MXBean.
> 245             // MXBean does not report old collections for G1Old, so we have nothing to compare with.
> 246             return;
> 247         }
> 
> Comment also needed to be updated.
> 

I removed the comment line:    // ConcurrentMarkSweep mixes old and new 
collections. Not same values as in MXBean.

> ------------------------------------------------------------------------------
> test/jdk/jdk/jfr/event/oldobject/TestMetadataRetention.java
>   90                 // System.gc() will trigger class unloading if -XX:+ExplicitGCInvokesConcurrent
>   91                 // is NOT set. If this flag is set G1 will never unload classes on System.gc().
> 
> I think this comment is no longer true.
> 
> If this test doesn't work with -ExplicitGCInvokesConcurrent then that
> should be filtered in an @requires check.  I don't know if it doesn't
> work though. >
> I don't know what this part of the comment means though:
>   92                 // As far as the "jfr" key guarantees no VM flags are set from the
>   93                 // outside it should be enough with System.gc().
> I assume it's referring to the "@key jfr" line in the @test description.
> But I don't know what that does.
> 
> Similarly in these:
> test/jdk/jdk/jfr/event/runtime/TestClassLoadingStatisticsEvent.java
> test/jdk/jdk/jfr/event/runtime/TestClassUnloadEvent.java

I guess this will be handled by: 
https://bugs.openjdk.java.net/browse/JDK-8232244

> ------------------------------------------------------------------------------
> 

Thanks for looking through all of this!

/Leo


More information about the hotspot-dev mailing list