Shenandoah experimental flags

Thu Apr 30 08:37:21 UTC 2020

Hi Cristian,

On 4/29/20 12:42 AM, Cristian Velazquez wrote:
> Hi everyone:
> 
> I'm testing out shenandoah on https://github.com/gaul/java-httpbin. I'm
> using wrk to send a huge amount of traffic: wrk -d 300 -t 1 -c 32
> http://127.0.0.1:8080/gzip. I'm using the default flags with 2G heap. 

Does not seem to reproduce on current head. What build/version are you using?

> I'm getting pause times from 100-300ms due to the time spent on finish queues.

Please show the part of GC log where you see this?

> I tried the following (I used as reference this
> https://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-July/006630.html
> ):
> - Reduce ShenandoahSATBBufferFlushInterval to 1 ms. This didn't make any
> difference.
> - Reduce ShenandoahSATBBufferSize to 32. This fixed the issue and now see
> better pause times.

Really! That is odd. The periodic SATB buffer is supposed to flush anything leftover before you hit
this problem.

> I have run it for quite a while and I haven't seen any issue with this
> configuration. Although, I would like to know what is the recommendation in
> this scenario? is it safe to decrease the buffer size? what is the impact
> of doing so?

Mostly throughput performance. SATB buffer is the thing that accumulates the old field values before
each store to maintain marking invariants. Without this buffer, every reference store would need to
call into VM, which would be costly. So the buffer is deliberately quite large to keep application
from calling into VM too often, and do that only for buffer flushes.

But, that matters only when marking is running. In my quick tests, your workload seems to be almost
fully young:

[93.298s][info][gc] Trigger: Free (204M) is below minimum threshold (204M)
[93.298s][info][gc] GC(12) Concurrent reset 0.218ms
[93.299s][info][gc] GC(12) Pause Init Mark (unload classes) 0.191ms
[93.301s][info][gc] GC(12) Concurrent marking (unload classes) 1.918ms
[93.301s][info][gc] GC(12) Pause Final Mark (unload classes) 0.221ms
[93.301s][info][gc] GC(12) Concurrent weak roots 0.362ms
[93.302s][info][gc] GC(12) Concurrent cleanup 1766M->102M(2021M) 0.269ms
[93.303s][info][gc] GC(12) Concurrent class unloading 1.403ms

...so marking is way too short for you to notice this.

What I suspect is happening in your case is that marking is *so short* that buffer flush never
happens until Final Mark, where we discover some work to do. But it is puzzling why that work is
>100ms, when the workload does not have a lot of live data.

Need more info...

-- 
Thanks,
-Aleksey