RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t [v2]

Vladimir Kozlov kvn at openjdk.java.net
Mon Nov 9 17:04:59 UTC 2020


On Mon, 9 Nov 2020 14:17:16 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen.
>> 
>> To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation:
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt     Score    Error  Units
>> SimpleRepeatCompilation.largeMethod_baseline     ss   10   168.919 ±  2.839  ms/op
>> SimpleRepeatCompilation.largeMethod_repeat       ss   10  8920.305 ± 40.531  ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c1    ss   10   153.961 ±  2.762  ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c2    ss   10  8242.061 ± 71.989  ms/op
>> SimpleRepeatCompilation.mixHashCode_baseline     ss   10    69.526 ±  7.098  ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat       ss   10  6733.627 ± 63.689  ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c1    ss   10   316.862 ± 29.682  ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c2    ss   10  4544.604 ± 57.439  ms/op
>> SimpleRepeatCompilation.trivialMath_baseline     ss   10    21.757 ±  1.553  ms/op
>> SimpleRepeatCompilation.trivialMath_repeat       ss   10   499.214 ± 35.984  ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c1    ss   10   100.345 ±  2.168  ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c2    ss   10   398.528 ±  4.718  ms/op
>> 
>> Patched:
>> Benchmark                                      Mode  Cnt     Score    Error  Units
>> SimpleRepeatCompilation.largeMethod_baseline     ss   10   164.355 ±  3.531  ms/op
>> SimpleRepeatCompilation.largeMethod_repeat       ss   10  8516.033 ± 22.408  ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c1    ss   10   151.181 ± 12.869  ms/op
>> SimpleRepeatCompilation.largeMethod_repeat_c2    ss   10  7857.373 ± 52.826  ms/op
>> SimpleRepeatCompilation.mixHashCode_baseline     ss   10    65.085 ±  5.643  ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat       ss   10  6601.693 ± 57.898  ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c1    ss   10   315.845 ± 27.474  ms/op
>> SimpleRepeatCompilation.mixHashCode_repeat_c2    ss   10  4456.847 ± 30.459  ms/op
>> SimpleRepeatCompilation.trivialMath_baseline     ss   10    21.273 ±  2.115  ms/op
>> SimpleRepeatCompilation.trivialMath_repeat       ss   10   506.873 ± 18.994  ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c1    ss   10   100.184 ±  3.008  ms/op
>> SimpleRepeatCompilation.trivialMath_repeat_c2    ss   10   397.010 ±  4.531  ms/op
>> 
>> This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition.
>> 
>> Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86.
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Avoid using ULL

Looks good.

src/hotspot/share/opto/indexSet.hpp line 67:

> 65:          block_index_length = 8,
> 66:          // Split over 4 or 8 words depending on bitness
> 67:          word_index_length  = block_index_length - LogBitsPerWord,

Nice. I also thought about using ‘word’ definitions.

-------------

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/1102


More information about the hotspot-compiler-dev mailing list