G1 concurrent refinement regression

Mon Sep 29 09:14:26 UTC 2025

Hi again,

   here is a more comprehensive answer using some example application:

On 28.09.25 18:19, Brian S O'Neill wrote:
> I'm observing an interesting regression between build 26-ea+16-1649 and 
> build 26-ea+17-1764 with respect to concurrent refinement. I suspect the 
> cause is JEP 522.
> 
> I ran a stress test in which GC pauses are about 25ms in the steady 
> state. With 26-ea+16-1649, concurrent refinement is never initiated.
> 
> With 26-ea+17-1764 (JEP 522) concurrent refinement starts after the test 
> has run for a few minutes. Eventually, concurrent refinement runs back 
> to back forever, wasting a CPU core and affecting overall performance.

These concurrent refinement log messages were added with JEP 522. 
Previously there were no refinement logs at info level; just because 
there were no logs, it does not mean that no refinement happens.

Typically, JEP 522 reduces refinement work by a very significant amount, 
making your application (typically faster).

Let me show you on an example: using specjbb2015 with a fixed load as a 
test application (because I do not know yours), using 8 parallel gc 
threads and a 50ms pause time goal (to induce some more refinement 
activity).

I used a pre-JEP 522 build (just the one before, 
ca182912a305e1e226d97d9613c7baf8a3d22780), and latest tip 
(75269fdb49aeb9d37acbbc1502c446a822fd30e3). Further, I will show results 
with refinement disabled (using -XX:-G1UseConcRefinement).

Pause time stats:

pre-522:   Total: 16,062s, Avg: 50,32ms
post-522:  Total: 17,245s, Avg: 48,31ms
no-refine: Total: 24,940s, Avg: 50,80ms

Post-522 spends more time in GC (this may be due to heuristics changes 
since pushing it), the no-refinement build spends much more time due to 
more work.

Additionally I obtained CPU usage per thread type periodically every 5 
seconds (using "jcmd <pid> PerfCounter.print"). Let's discuss the 
results for the 50th occurence (~250s into the run):

CPU time values are in s (I just added a comma - originally they were in 
ns), first is cpu time for concurrent mark, then concurrent refinement, 
then the parallel workers (~gc pause cpu time) and then the service thread.

pre-jep522:

sun.threads.cpu_time.gc_conc_mark=29,602801037
sun.threads.cpu_time.gc_conc_refine=85,135331304
sun.threads.cpu_time.gc_parallel_workers=130,768855722
sun.threads.cpu_time.gc_service=0,059843947

post-jep522:

sun.threads.cpu_time.gc_conc_mark=29,524970498
sun.threads.cpu_time.gc_conc_refine=53,489763400
sun.threads.cpu_time.gc_conc_refine_control=0,777702253
sun.threads.cpu_time.gc_parallel_workers=139,873904419
sun.threads.cpu_time.gc_service=0,256793143

The first observation is that, the post-jep522 build split refinement 
cpu time into "conc_refine" and "conc_refine_control" threads, but 
that's an implementation detail - just add them together for comparison.

CPU spent in concurrent marking is the same, as expected. This is a 
fixed load test after all.

There is a significant difference in time spent in refinement between 
these builds. Pre-JEP 522 around 85s are spent there, around 54.5s 
post-JEP 522.

So actually post-JEP 522 builds spend ~36% _less_ time in concurrent 
refinement.

Post-JEP 522 spends around 7% more in garbage collection.

However, summing up all components, post-JEP 522 spends around 9.7% less 
time in GC related activities. So it seems a win.

no-refine (post-jep522 build):

sun.threads.cpu_time.gc_conc_mark=29,373202848
sun.threads.cpu_time.gc_conc_refine=0
sun.threads.cpu_time.gc_conc_refine_control=0
sun.threads.cpu_time.gc_parallel_workers=152,750078728
sun.threads.cpu_time.gc_service=0,237132193

When disabling concurrent refinement as -XX:-G1UseConcRefinement does, 
more work is moved into the pauses. From a throughput POV it is a win, 
spending only ~182s in total in GC threads, vs. 223s (post-JEP 522) and 
245s (pre-JEP 522).

G1 then spends more time in pauses though, potentially affecting 
latencies more.

> 
> [244.479s][info][gc,start    ] GC(60) Pause Young (Normal) (G1 
> Evacuation Pause)
 > [...]> [403.761s][info][gc,refine      ] Concurrent Refine Complete 
Work 0.022ms
> [403.761s][info][gc,refine      ] Concurrent Refine Cycle 160.341ms
> 
> The problem goes away when I set -XX:MaxGCPauseMillis=500 or higher. 
> Considering that the pauses are always well below the 200ms default, I'm 
> surprised that concurrent refinement occurs at all.

As already mentioned in the other email, by default G1 allocates 10% of 
pause time for refinement in the pause. The rest is (attempted to be) 
moved out to the concurrent phase. By increasing the maximum pause time 
you increase that allowance.

Concurrent work adds some inefficiencies depending on your application, 
but typically reduces time spent in pause where the application comes to 
a complete halt.

G1 tries to provide a balance between pauses and concurrent work even 
when nominally it is allowed to consume 200ms per pause; the (fixed) 
value of -XX:G1RSetUpdatingPauseTimePercent is also likely not optimal.

Still nowadays people do not really want a 200ms pause even if they 
implicitly allow it (by not specifying any pause time goal), so by 
default some concurrent refinement is enabled and often actually performed.

(Fwiw, by default G1 is allowed a pause time goal of 200ms within any 
interval of 201ms - it is unlikely that people really want that).

It would also be nice if you verified that in your test application G1 
really does not do any refinement. If it really does not, it would be 
fairly surprising, and we would be interested in investigated this more.

The logging I asked for (gc+refine=debug and gc+phases=debug) will help 
us to analyze what is happening in your case.

(If you enabled more detailed refinement logging in the pre-JEP 522 
build, I think one could also see refinement activity, i.e. use 
gc+refine=debug).

Hth,
   Thomas