Generational ZGC regression with H2 database

Fri Nov 11 14:23:37 UTC 2022

Hi Brian,

I have a status update. It looks like there are some objects in here that are very sensitive to locality, especially when running without -XX:+UseLargePages. I suppose that when the GCs took longer, the application threads got to do more of the relocation themselves, leading to objects re-ordering in access order, which has pretty neat locality effects. With generational ZGC finishing up quicker, it would seem that the GC beat the application in this race, resulting in worse locality.

I could reproduce the regression. But if I add a short sleep before relocating the objects in the young generation from the GC threads, then the entire regression goes away and it starts running faster than mainline ZGC. This behaviour is very consistent. It’s one of those hilarious and obscure cases when making the GC more efficient and running faster, makes the application run slower. Do’h.

While we do have some tricks up our sleeves for the future that will likely remove this problem, I don’t know if we will fix it for the first release of generational ZGC. If it is an important use case to improve the locality for the application by delaying the GC, I suppose we could have a JVM flag to specify how long the GC should sleep before starting relocation. But I don’t know how I feel about that yet, or how useful that would really be.

Thought I should at least give you a status update on my findings on this issue so far. And thanks for the reproducer - it was very helpful!

/Erik

> On 19 Oct 2022, at 17:33, Brian S O'Neill <bronee at gmail.com> wrote:
> 
> I created a simple test which stresses a fresh H2 database by inserting a bunch of random rows into a simple key-value table. The H2 cache size is set to 4GB, and the max Java heap size is 12GB.
> 
> I ran the test using JDK-19 ZGC for 30 minutes, and the average throughput was 23,075 inserts per second. With generational ZGC, the average throughput dropped to 20,284 inserts per second, and 11 allocation stalls were observed. The longest one lasted 724 milliseconds. Note that no allocation stalls were observed with non-generational ZGC.
> 
> Here's the code:
> 
> import java.sql.*;
> import java.util.*;
> 
> public class H2Perf {
>    public static void main(String[] args) throws Exception {
>        String path = args[0];
> 
>        long numToInsert = 100_000_000L;
>        long maxDurationMillis = 30L * 60 * 1000;
> 
>        Class.forName("org.h2.Driver");
>        String conString = "jdbc:h2:" + path + ";CACHE_SIZE=4000000";
> 
>        Connection con = DriverManager.getConnection(conString, "sa", "");
> 
>        try (Statement st = con.createStatement()) {
>            st.execute("create table test_tab (" +
>                       "name varchar(30) primary key," +
>                       "val varchar(30) )");
>        }
> 
>        PreparedStatement ps = con.prepareStatement
>            ("insert into test_tab (name, val) values (?,?)");
> 
>        var rnd = new Random(234234);
>        int total = 0;
>        long durationMillis = 0;
>        long start = System.currentTimeMillis();
> 
>        do {
>            long n = rnd.nextLong();
>            ps.setString(1, "key-" + n);
>            ps.setString(2, "value-" + n);
>            ps.execute();
> 
>            total++;
> 
>            if (total % 100_000 == 0) {
>                durationMillis = (System.currentTimeMillis() - start);
>                float rate = (float) ((total / (double) durationMillis) * 1000.0);
>                System.out.println("inserted: " + total + " @ " + rate + " per second");
>            }
>        } while (total < numToInsert && durationMillis < maxDurationMillis);
>    }
> }