SoftMaxHeapSize has no effect with Cassandra

Kornel Pal kornelpal at gmail.com
Tue Jan 12 23:06:45 UTC 2021


Hi,

Thank you for getting back to me and for filing a bug.

Aleksey Shipilev wrote:
> Please post the full list of GC flags you are using? And maybe the excerpts from the GC logs that show unexpected (with ShenandoahSoftMaxHeapSize setting) and expected (with ShenandoahMinFreeThreshold setting) behaviors?
> 
> This is only to confirm that we are seeing the same issue.

I removed the default GC settings of Cassandra, and I was using these two very basic configurations with 8G heap, so the 4G soft heap and the 50% min free threshold should have similar results:

SoftMaxHeapSize test:
-XX:+UseShenandoahGC
-XX:ShenandoahSoftMaxHeapSize=4G
-XX:+PrintGCDetails
-Xloggc:...

MinFreeThreshold test:
-XX:+UseShenandoahGC
-XX:+UnlockExperimentalVMOptions
-XX:ShenandoahMinFreeThreshold=50
-XX:+PrintGCDetails
-Xloggc:...

Logs using ShenandoahSoftMaxHeapSize=4G with 8G heap:

This is an example with an "Adaptive CSet Selection" step:
Trigger: Free (818M) is below minimum threshold (819M)
Free: 818M, Max: 4096K regular, 796M humongous, Frag: 1% external, 2% internal; Reserve: 412M, Max: 4096K
295.759: [Concurrent reset, start]
     Using 1 of 2 workers for concurrent reset
     Pacer for Reset. Non-Taxable: 8192M
295.781: [Concurrent reset, 22.063 ms]
295.782: [Pause Init Mark, start]
     Using 2 of 2 workers for init marking
     Pacer for Mark. Expected Live: 42067K, Free: 817M, Non-Taxable: 83700K, Alloc Tax Rate: 0.1x
295.785: [Pause Init Mark, 3.170 ms]
295.785: [Concurrent marking, start]
     Using 1 of 2 workers for concurrent marking
295.858: [Concurrent marking, 72.770 ms]
295.859: [Pause Final Mark, start]
     Using 2 of 2 workers for final marking
     Adaptive CSet Selection. Target Free: 1160M, Actual Free: 7441M, Max CSet: 341M, Min Garbage: 0B
     Collectable Garbage: 6917M (99%), Immediate: 6233M (90%), CSet: 683M (9%)
     Pacer for Evacuation. Used CSet: 692M, Free: 7042M, Non-Taxable: 704M, Alloc Tax Rate: 1.1x
295.870: [Pause Final Mark, 11.155 ms]
295.871: [Concurrent cleanup, start]
295.871: [Concurrent cleanup 6970M->741M(8192M), 0.212 ms]
Free: 7042M, Max: 4096K regular, 800M humongous, Frag: 89% external, 0% internal; Reserve: 408M, Max: 4096K
295.871: [Concurrent evacuation, start]
     Using 1 of 2 workers for concurrent evacuation
295.896: [Concurrent evacuation, 25.222 ms]
295.897: [Pause Init Update Refs, start]
     Pacer for Update Refs. Used: 751M, Free: 7039M, Non-Taxable: 703M, Alloc Tax Rate: 1.1x
295.898: [Pause Init Update Refs, 0.362 ms]
295.898: [Concurrent update references, start]
     Using 1 of 2 workers for concurrent reference update
295.948: [Concurrent update references, 50.645 ms]
295.953: [Pause Final Update Refs, start]
     Using 2 of 2 workers for final reference update
295.954: [Pause Final Update Refs, 1.184 ms]
295.954: [Concurrent cleanup, start]
295.954: [Concurrent cleanup 756M->66144K(8192M), 0.075 ms]
Free: 7715M, Max: 4096K regular, 6540M humongous, Frag: 16% external, 0% internal; Reserve: 412M, Max: 4096K

This seems to have very similar values but has no "Adaptive CSet Selection" step logged (from the same run as the above):
Trigger: Free (818M) is below minimum threshold (819M)
Free: 818M, Max: 4096K regular, 796M humongous, Frag: 1% external, 3% internal; Reserve: 412M, Max: 4096K
376.147: [Concurrent reset, start]
     Using 1 of 2 workers for concurrent reset
     Pacer for Reset. Non-Taxable: 8192M
376.157: [Concurrent reset, 10.150 ms]
376.158: [Pause Init Mark, start]
     Using 2 of 2 workers for init marking
     Pacer for Mark. Expected Live: 40091K, Free: 818M, Non-Taxable: 83802K, Alloc Tax Rate: 0.1x
376.160: [Pause Init Mark, 2.054 ms]
376.160: [Concurrent marking, start]
     Using 1 of 2 workers for concurrent marking
376.193: [Concurrent marking, 33.004 ms]
376.194: [Pause Final Mark, start]
     Using 2 of 2 workers for final marking
     Collectable Garbage: 6352M (91%), Immediate: 6352M (91%), CSet: 0B (0%)
376.194: [Pause Final Mark, 0.362 ms]
376.195: [Concurrent cleanup, start]
376.195: [Concurrent cleanup 6966M->618M(8192M), 0.208 ms]
Free: 7161M, Max: 4096K regular, 796M humongous, Frag: 89% external, 0% internal; Reserve: 412M, Max: 4096K
Free: 7161M, Max: 4096K regular, 796M humongous, Frag: 89% external, 0% internal; Reserve: 412M, Max: 4096K

Even in the case when there is no "Adaptive CSet Selection" step logged, it seems to end up collecting the garbage, so the issue I am experiencing might have a different root cause than I originally thought.

The free value in the trigger log seems to have the soft tail is subtracted, so that might explain why it is lower than 4G. On the other hand the garbage size is indicated to be more than the 4G soft heap size, so the free size might be the full value, including the soft tail.

ShenandoahMinFreeThreshold=50 on the other hand seems to trigger when the free size goes slightly below 4G, and the garbage values are less than 4G too that seems to be more consistent:
Trigger: Free (4095M) is below minimum threshold (4095M)
Free: 4095M, Max: 4096K regular, 4064M humongous, Frag: 1% external, 0% internal; Reserve: 412M, Max: 4096K
146.861: [Concurrent reset, start]
     Using 1 of 2 workers for concurrent reset
     Pacer for Reset. Non-Taxable: 8192M
146.863: [Concurrent reset, 2.192 ms]
146.864: [Pause Init Mark, start]
     Using 2 of 2 workers for init marking
     Pacer for Mark. Expected Live: 81972K, Free: 4095M, Non-Taxable: 409M, Alloc Tax Rate: 0.0x
146.865: [Pause Init Mark, 1.769 ms]
146.865: [Concurrent marking, start]
     Using 1 of 2 workers for concurrent marking
146.927: [Concurrent marking, 61.843 ms]
146.934: [Pause Final Mark, start]
     Using 2 of 2 workers for final marking
     Adaptive CSet Selection. Target Free: 4437M, Actual Free: 7407M, Max CSet: 341M, Min Garbage: 0B
     Collectable Garbage: 3630M (99%), Immediate: 2931M (80%), CSet: 698M (19%)
     Pacer for Evacuation. Used CSet: 706M, Free: 7010M, Non-Taxable: 701M, Alloc Tax Rate: 1.1x
146.949: [Pause Final Mark, 14.292 ms]
146.949: [Concurrent cleanup, start]
146.949: [Concurrent cleanup 3700M->773M(8192M), 0.126 ms]
Free: 7007M, Max: 4096K regular, 4064M humongous, Frag: 42% external, 0% internal; Reserve: 409M, Max: 4096K
146.949: [Concurrent evacuation, start]
     Using 1 of 2 workers for concurrent evacuation
146.960: [Concurrent evacuation, 10.771 ms]
146.960: [Pause Init Update Refs, start]
     Pacer for Update Refs. Used: 783M, Free: 7005M, Non-Taxable: 700M, Alloc Tax Rate: 1.1x
146.960: [Pause Init Update Refs, 0.045 ms]
146.961: [Concurrent update references, start]
     Using 1 of 2 workers for concurrent reference update
147.018: [Concurrent update references, 57.464 ms]
147.018: [Pause Final Update Refs, start]
     Using 2 of 2 workers for final reference update
147.020: [Pause Final Update Refs, 1.149 ms]
147.020: [Concurrent cleanup, start]
147.020: [Concurrent cleanup 798M->93730K(8192M), 0.081 ms]
Free: 7688M, Max: 4096K regular, 4072M humongous, Frag: 47% external, 0% internal; Reserve: 412M, Max: 4096K

I also found this near the end of the log, even when using ShenandoahSoftMaxHeapSize=4G:
Cancelling GC: Stopping VM
Heap
Shenandoah Heap
  8192M max, 8192M soft max, 8192M committed, 3986M used
  2048 x 4096K regions
Status: cancelled

Is it possible that the initial ShenandoahSoftMaxHeapSize value is never set for the heap? I could only find it being set on run-time value changes (JDK8), but there was no such run-time change logged.

Thank you,
Kornel


More information about the shenandoah-dev mailing list