RFR: 8261847: performace of java.lang.Record::toString should be improved [v4]
Claes Redestad
redestad at openjdk.java.net
Mon Nov 22 15:57:19 UTC 2021
On Sun, 21 Nov 2021 00:29:46 GMT, Vicente Romero <vromero at openjdk.org> wrote:
>> Please review this PR which aims to optimize the implementation of the `toString` method we provide for records. A benchmark comparing the implementation we are providing for records with lombok found out that lombok is much faster mainly because our implementation uses `String::format`. This fix is basically delegating on StringConcatFactory::makeConcatWithConstants which is faster.
>>
>> TIA
>>
>> This is the result of the benchmark comparing records to lombok with vanilla JDK:
>>
>> Benchmark Mode Cnt Score Error Units
>> MyBenchmark.base avgt 4 0.849 ± 0.111 ns/op
>> MyBenchmark.equals_record avgt 4 7.343 ± 2.740 ns/op
>> MyBenchmark.equals_value avgt 4 6.644 ± 1.920 ns/op
>> MyBenchmark.record_hash_code avgt 4 5.763 ± 3.882 ns/op
>> MyBenchmark.record_to_string avgt 4 262.626 ± 12.574 ns/op <------ Before
>> MyBenchmark.value_class_to_string avgt 4 30.325 ± 21.389 ns/op
>> MyBenchmark.value_hash_code avgt 4 5.048 ± 3.936 ns/op
>>
>>
>> after this patch:
>>
>> Benchmark Mode Cnt Score Error Units
>> MyBenchmark.base avgt 4 0.680 ± 0.185 ns/op
>> MyBenchmark.equals_record avgt 4 5.599 ± 1.348 ns/op
>> MyBenchmark.equals_value avgt 4 5.718 ± 4.633 ns/op
>> MyBenchmark.record_hash_code avgt 4 4.628 ± 4.368 ns/op
>> MyBenchmark.record_to_string avgt 4 26.791 ± 1.817 ns/op <------- After
>> MyBenchmark.value_class_to_string avgt 4 35.473 ± 2.626 ns/op
>> MyBenchmark.value_hash_code avgt 4 6.152 ± 5.101 ns/op
>
> Vicente Romero has updated the pull request incrementally with one additional commit since the last revision:
>
> setting max split size to 20
FWIW I did a few experiments trying to move the chunking to `SCF`. Most made things worse, but this is getting close: https://github.com/openjdk/jdk/compare/master...cl4es:scf_split?expand=1
The threshold for when the JIT fails to optimize seem to be pushed up a bit and I get a 4x gains when concatenating 100 Strings (though it takes a good while for the microbenchmark to stabilize). It also fails to make much of a difference when looking at 200 arguments (maybe 1.4x win). The experiment I have done so far are crude and aggregates the results into a String builder with no size estimation, so it adds a but of unnecessary allocation that could be improved. I think this needs way more work to be a good enhancement and that splitting up outside is going to remain better for now.
A "hidden" drawback with chunking is that you're likely going to be allocating more.
I also re-realized that it'll be pretty hard (or impossible) to get around having some MH slot limit: even though we chunk it up internally, the MH we return must have a type that matches the type given to the bootstrap method, and any operations on the tree that require some temporary arguments we need to pass around will require some argument slots. The current strategy doesn't use too many temporaries, so the real limit might be around 250, but it makes sense to give some room for alternative implementations that use more temporaries if that makes things more efficient.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6403
More information about the core-libs-dev
mailing list