RFR: 8261847: performace of java.lang.Record::toString should be improved [v3]

Sun Nov 21 00:18:06 UTC 2021

On Fri, 19 Nov 2021 05:07:23 GMT, Vicente Romero <vromero at openjdk.org> wrote:

>> Please review this PR which aims to optimize the implementation of the `toString` method we provide for records. A benchmark comparing the implementation we are providing for records with lombok found out that lombok is much faster mainly because our implementation uses `String::format`. This fix is basically delegating on StringConcatFactory::makeConcatWithConstants which is faster.
>> 
>> TIA
>> 
>> This is the result of the benchmark comparing records to lombok with vanilla JDK:
>> 
>> Benchmark                          Mode  Cnt    Score    Error  Units
>> MyBenchmark.base                   avgt    4    0.849 ±  0.111  ns/op
>> MyBenchmark.equals_record          avgt    4    7.343 ±  2.740  ns/op
>> MyBenchmark.equals_value           avgt    4    6.644 ±  1.920  ns/op
>> MyBenchmark.record_hash_code       avgt    4    5.763 ±  3.882  ns/op
>> MyBenchmark.record_to_string       avgt    4  262.626 ± 12.574  ns/op               <------ Before
>> MyBenchmark.value_class_to_string  avgt    4   30.325 ± 21.389  ns/op
>> MyBenchmark.value_hash_code        avgt    4    5.048 ±  3.936  ns/op
>> 
>> 
>> after this patch:
>> 
>> Benchmark                          Mode  Cnt   Score   Error  Units
>> MyBenchmark.base                   avgt    4   0.680 ± 0.185  ns/op
>> MyBenchmark.equals_record          avgt    4   5.599 ± 1.348  ns/op
>> MyBenchmark.equals_value           avgt    4   5.718 ± 4.633  ns/op
>> MyBenchmark.record_hash_code       avgt    4   4.628 ± 4.368  ns/op
>> MyBenchmark.record_to_string       avgt    4  26.791 ± 1.817  ns/op                 <------- After
>> MyBenchmark.value_class_to_string  avgt    4  35.473 ± 2.626  ns/op
>> MyBenchmark.value_hash_code        avgt    4   6.152 ± 5.101  ns/op
>
> Vicente Romero has updated the pull request incrementally with one additional commit since the last revision:
> 
>   adding the benchmark

Thanks guys for the discussion so far. I wanted to add some graphs I generated. I modified the code at ObjectMethods to split the getters in slot sizes from 10 to 200 in increments of 10. Then I executed the benchmark with the given slot size for records and lombok classes with `int` and `String` components. In every case the record and the lombok class had all ints or all String components / fields. I collected data for 1, 10, 100 and 254 components / fields. See below the graphical representation of the data I got. Some comments below analyzing the data for records:

### 1 component
As expected the slot size doesn't matter much for this case as we are dealing with only one component:
![1Component](https://user-images.githubusercontent.com/62155190/142744040-c1e9c72f-b93b-4b33-8db9-1e04b2e7661f.jpg)

### 10 components
Here the slot size doesn't make a big difference as expected. There is a tight range for the case when all the components are ints. For the string case the range is also very tight modulo a couple of data points but we could safely say that the numbers are pretty good for slots sizes from 10-80:
![10Components](https://user-images.githubusercontent.com/62155190/142743997-8f17bc51-0522-4b09-ab86-5af00e1652c6.jpg)

### 100 components
Here we start seeing the number of slots playing a more important role. See how the time grows in a quasi logarithmic function for 100 string components once the slot size is greater than 70 for the all integers case we get pretty good numbers up to slot size 60:
![100Components](https://user-images.githubusercontent.com/62155190/142743998-af3a9222-3c78-4887-bc2a-9494fbec5e37.jpg)

### 254 components
The results for 254 integer components show a lot of variation. The results are good for slot sizes 10, 20 and then for most sizes from 70 to 170, not for 150. Whereas for the all strings case we have good numbers up to slot size 60 and it peaks up from there: 
![254Components](https://user-images.githubusercontent.com/62155190/142744000-2c90ff34-d780-442e-a604-78a78252f10a.jpg)

So these data seems to confirm that slot sizes from 10-30 is where we want to be. I will update the patch to use 20 slots to split the getters. Thanks again for the discussion!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6403