RFR: 8257815: Replace global log2 functions with efficient implementations [v6]

Wed Dec 9 11:46:38 UTC 2020

On Tue, 8 Dec 2020 19:32:01 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> This patch replaces the log2 functions in globalDefinitions.hpp with more efficient counterparts in utilities/powerOfTwo.hpp
>> 
>> Naming is hard, but I think the following scheme is reasonable:
>> 
>> - log2i: any integral type. 0-hostile
>> - log2i_allow_zero: any integral type. gracefully handles zero (adds a branch)
>> - exact_log2i: any integral type. value must be a power of two
>> 
>> I chose log2i rather than log2 to stand out from the log2 functions defined in various standard libraries.
>> 
>> Going through all usage, quite a few uses of log2_long et.c. could be replaced by exact_log2i since they take something that has been checked to be a power of two. Most of the remaining usage seem to be able to use the 0-hostile variant, which avoids a branch.
>> 
>> To sanity check that calculating log2 using count_leading_zeros gives better performance I added a couple of trivial and short-running microbenchmarks to test_powerOfTwo. For small values (<= 1025) the new impl is ~5x faster, with a larger speed-up for larger integer values:
>> 
>> [ RUN      ] power_of_2.log2_long_micro
>> [       OK ] power_of_2.log2_long_micro (3581 ms)
>> [ RUN      ] power_of_2.log2_long_small_micro
>> [       OK ] power_of_2.log2_long_small_micro (549 ms)
>> [ RUN      ] power_of_2.log2i_micro
>> [       OK ] power_of_2.log2i_micro (259 ms)
>> [ RUN      ] power_of_2.log2i_small_micro
>> [       OK ] power_of_2.log2i_small_micro (113 ms)
>> 
>> I'm not sure if this naive microbenchmark carries its own weight, but it just adds a few seconds and can be useful for quickly checking this performance assumption on other H/W
>> 
>> (Intending this for 17)
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   x->value

I didn't review the ppc or s390 changes, shenandoah changes, or x86_64.ad (I
don't understand the language here).

src/hotspot/share/utilities/powerOfTwo.hpp line 78:

> 76:                        "x must be a power of 2: " UINT64_FORMAT,
> 77:                        (uint64_t)static_cast<typename std::make_unsigned<T>::type>(value));
> 78:   return ilog2(value);

`count_trailing_zeros(value);` is more direct.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3996:

> 3994:         /*is32*/false, (uint64_t)CompressedKlassPointers::base())) {
> 3995:     const uint64_t range_mask =
> 3996:       (1ULL << ilog2_graceful(CompressedKlassPointers::range())) - 1;

I think use of ilog2_graceful is misleading, as a zero argument is not actually permitted here.  If the argument is zero, that operation will return -1, which will be used as a shift quantity, which is UB.  And since the argument's type is size_t, the "graceful" handling of negative values isn't relevant.

src/hotspot/cpu/x86/vm_version_x86.cpp line 784:

> 782:               cpu_family(), _model, _stepping, os::cpu_microcode_revision());
> 783:   assert(res > 0, "not enough temporary space allocated");
> 784:   assert(exact_ilog2((uint64_t)CPU_MAX_FEATURE) + 1 == sizeof(_features_names) / sizeof(char*), "wrong size features_names");

Cast of CPU_MAX_FEATURE isn't needed.

-------------

Changes requested by kbarrett (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/1663