RFR: 8352536: Add overloads to parse and build class files from/to MemorySegment [v5]

Thu Mar 27 17:42:22 UTC 2025

On Thu, 27 Mar 2025 17:38:49 GMT, David M. Lloyd <duke at openjdk.org> wrote:

>> Provide method overloads to the ClassFile interface of the java.lang.classfile API which allow parsing of classes found in memory segments, as well as allowing built class files to be output to them.
>
> David M. Lloyd has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add a benchmark for class file emission

Here's the raw benchmark results against `AbstractMap` and `TreeMap`:

Benchmark                                 Mode  Cnt       Score      Error  Units
MemorySegmentBenchmark.emitWithCopy0     thrpt    5  198061.082 ± 2300.146  ops/s
MemorySegmentBenchmark.emitWithCopy1     thrpt    5   35352.167 ±  320.823  ops/s
MemorySegmentBenchmark.emitWithoutCopy0  thrpt    5  265208.111 ± 1416.120  ops/s
MemorySegmentBenchmark.emitWithoutCopy1  thrpt    5   53215.327 ±  354.228  ops/s

`0` is the smaller `AbstractMap` class bytes and `1` is the larger `TreeMap` class bytes. For case 0 we see an improvement of around 34% overall, and case 1 shows an improvement of closer to 50% (which is expected, since larger classes would mean copying more bytes as well as putting more pressure on the GC).

Here is the same benchmark with `-prof gc` enabled:

Benchmark                                                    Mode  Cnt       Score      Error   Units
MemorySegmentBenchmark.emitWithCopy0                        thrpt    5  197728.066 ± 3107.524   ops/s
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate          thrpt    5    3900.963 ±   61.292  MB/sec
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate.norm     thrpt    5   20688.004 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy0:gc.count               thrpt    5     680.000             counts
MemorySegmentBenchmark.emitWithCopy0:gc.time                thrpt    5     415.000                 ms
MemorySegmentBenchmark.emitWithCopy1                        thrpt    5   35504.531 ±  260.423   ops/s
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate          thrpt    5    3512.621 ±   25.778  MB/sec
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate.norm     thrpt    5  103744.020 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy1:gc.count               thrpt    5     673.000             counts
MemorySegmentBenchmark.emitWithCopy1:gc.time                thrpt    5     413.000                 ms
MemorySegmentBenchmark.emitWithoutCopy0                     thrpt    5  265533.600 ± 1707.914   ops/s
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate       thrpt    5    3547.167 ±   22.811  MB/sec
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate.norm  thrpt    5   14008.003 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy0:gc.count            thrpt    5     651.000             counts
MemorySegmentBenchmark.emitWithoutCopy0:gc.time             thrpt    5     392.000                 ms
MemorySegmentBenchmark.emitWithoutCopy1                     thrpt    5   52727.917 ±  624.059   ops/s
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate       thrpt    5    3531.104 ±   42.004  MB/sec
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate.norm  thrpt    5   70224.013 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy1:gc.count            thrpt    5     683.000             counts
MemorySegmentBenchmark.emitWithoutCopy1:gc.time             thrpt    5     412.000                 ms

You can see that in addition to the overhead of copying, we also put a bit more pressure on the GC despite having similar numbers of allocations by filling up our allocation regions more quickly with the extra large array per operation, which requires a little more time to be spent in GC on average. We are allocating roughly the same *number* of objects in either case.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24139#issuecomment-2758926428