RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t
Claes Redestad
redestad at openjdk.java.net
Sun Nov 8 20:47:04 UTC 2020
On Fri, 6 Nov 2020 21:55:47 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen.
>>
>> To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation:
>>
>> Baseline:
>> Benchmark Mode Cnt Score Error Units
>> SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ± 2.839 ms/op
>> SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ± 40.531 ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ± 2.762 ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ± 71.989 ms/op
>> SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ± 7.098 ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ± 63.689 ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ± 29.682 ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ± 57.439 ms/op
>> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ± 1.553 ms/op
>> SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ± 35.984 ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ± 2.168 ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ± 4.718 ms/op
>>
>> Patched:
>> Benchmark Mode Cnt Score Error Units
>> SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ± 3.531 ms/op
>> SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ± 22.408 ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ± 12.869 ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ± 52.826 ms/op
>> SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ± 5.643 ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ± 57.898 ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ± 27.474 ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ± 30.459 ms/op
>> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ± 2.115 ms/op
>> SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ± 18.994 ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ± 3.008 ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ± 4.531 ms/op
>>
>> This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition.
>>
>> Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86.
>
> Looks good in general.
> You may want to compare RA times from -XX:+LogCompilation to see clear difference.
Using +CITime to get a breakdown of a sample run of Regalloc times for largeMethod_repeat_c2, baseline:
C2 Compile Time: 8.731 s
...
Regalloc: 4.759 s
Ctor Chaitin: 0.000 s
Build IFG (virt): 0.190 s
Build IFG (phys): 1.523 s
Compute Liveness: 0.235 s
Regalloc Split: 0.284 s
Postalloc Copy Rem: 0.283 s
Merge multidefs: 0.011 s
Fixup Spills: 0.012 s
Compact: 0.005 s
Coalesce 1: 0.127 s
Coalesce 2: 0.002 s
Coalesce 3: 0.747 s
Cache LRG: 0.005 s
Simplify: 0.375 s
Select: 0.423 s
Other: 0.536 s
Patch:
C2 Compile Time: 8.317 s
...
Regalloc: 4.340 s
Ctor Chaitin: 0.000 s
Build IFG (virt): 0.162 s
Build IFG (phys): 1.344 s
Compute Liveness: 0.237 s
Regalloc Split: 0.284 s
Postalloc Copy Rem: 0.279 s
Merge multidefs: 0.011 s
Fixup Spills: 0.012 s
Compact: 0.004 s
Coalesce 1: 0.121 s
Coalesce 2: 0.002 s
Coalesce 3: 0.680 s
Cache LRG: 0.005 s
Simplify: 0.345 s
Select: 0.362 s
Other: 0.490 s
Timings appear pretty stable from run-to-run. No significant change in other phases.
-------------
PR: https://git.openjdk.java.net/jdk/pull/1102
More information about the hotspot-compiler-dev
mailing list