RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t
Vladimir Kozlov
kvn at openjdk.java.net
Sun Nov 8 20:47:04 UTC 2020
On Fri, 6 Nov 2020 21:21:56 GMT, Claes Redestad <redestad at openjdk.org> wrote:
> This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen.
>
> To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation:
>
> Baseline:
> Benchmark Mode Cnt Score Error Units
> SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ± 2.839 ms/op
> SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ± 40.531 ms/op
> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ± 2.762 ms/op
> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ± 71.989 ms/op
> SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ± 7.098 ms/op
> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ± 63.689 ms/op
> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ± 29.682 ms/op
> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ± 57.439 ms/op
> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ± 1.553 ms/op
> SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ± 35.984 ms/op
> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ± 2.168 ms/op
> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ± 4.718 ms/op
>
> Patched:
> Benchmark Mode Cnt Score Error Units
> SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ± 3.531 ms/op
> SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ± 22.408 ms/op
> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ± 12.869 ms/op
> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ± 52.826 ms/op
> SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ± 5.643 ms/op
> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ± 57.898 ms/op
> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ± 27.474 ms/op
> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ± 30.459 ms/op
> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ± 2.115 ms/op
> SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ± 18.994 ms/op
> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ± 3.008 ms/op
> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ± 4.531 ms/op
>
> This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition.
>
> Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86.
Looks good in general.
You may want to compare RA times from -XX:+LogCompilation to see clear difference.
src/hotspot/share/opto/regmask.cpp line 85:
> 83: return SlotsPerVecA;
> 84: default:
> 85: // Op_VecS and the rest ideal registers.
Add assert to make sure we see only expected values here.
src/hotspot/share/opto/indexSet.hpp line 105:
> 103: // access to IndexSet and IndexSetIterator.
> 104:
> 105: // A BitBlock is composed of some number of 64 bit words. When a BitBlock
63- or 32- bit words
-------------
PR: https://git.openjdk.java.net/jdk/pull/1102
More information about the hotspot-compiler-dev
mailing list