RFR: 8257815: Replace global log2 functions with efficient implementations [v6]
Stefan Karlsson
stefank at openjdk.java.net
Wed Dec 9 08:56:42 UTC 2020
On Tue, 8 Dec 2020 19:32:01 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> This patch replaces the log2 functions in globalDefinitions.hpp with more efficient counterparts in utilities/powerOfTwo.hpp
>>
>> Naming is hard, but I think the following scheme is reasonable:
>>
>> - log2i: any integral type. 0-hostile
>> - log2i_allow_zero: any integral type. gracefully handles zero (adds a branch)
>> - exact_log2i: any integral type. value must be a power of two
>>
>> I chose log2i rather than log2 to stand out from the log2 functions defined in various standard libraries.
>>
>> Going through all usage, quite a few uses of log2_long et.c. could be replaced by exact_log2i since they take something that has been checked to be a power of two. Most of the remaining usage seem to be able to use the 0-hostile variant, which avoids a branch.
>>
>> To sanity check that calculating log2 using count_leading_zeros gives better performance I added a couple of trivial and short-running microbenchmarks to test_powerOfTwo. For small values (<= 1025) the new impl is ~5x faster, with a larger speed-up for larger integer values:
>>
>> [ RUN ] power_of_2.log2_long_micro
>> [ OK ] power_of_2.log2_long_micro (3581 ms)
>> [ RUN ] power_of_2.log2_long_small_micro
>> [ OK ] power_of_2.log2_long_small_micro (549 ms)
>> [ RUN ] power_of_2.log2i_micro
>> [ OK ] power_of_2.log2i_micro (259 ms)
>> [ RUN ] power_of_2.log2i_small_micro
>> [ OK ] power_of_2.log2i_small_micro (113 ms)
>>
>> I'm not sure if this naive microbenchmark carries its own weight, but it just adds a few seconds and can be useful for quickly checking this performance assumption on other H/W
>>
>> (Intending this for 17)
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
>
> x->value
src/hotspot/cpu/ppc/ppc.ad line 9078:
> 9076: long src2 = $src2$$constant;
> 9077: long src3 = $src3$$constant;
> 9078: long maskbits = src3 + exact_ilog2((jlong) (julong) (juint) -src2);
I can't verify that this is correct. Someone else has to review this and similar changes below.
src/hotspot/cpu/x86/gc/z/zGlobals_x86.cpp line 142:
> 140: const size_t max_address_offset_bits = 44; // 16TB
> 141: const size_t address_offset = round_up_power_of_2(MaxHeapSize * ZVirtualToPhysicalRatio);
> 142: const size_t address_offset_bits = ilog2(address_offset);
exact_ilog2?
src/hotspot/share/compiler/compilerDefinitions.cpp line 120:
> 118: // one bit shorter then the value of the notification frequency. Set
> 119: // max_freq_bits accordingly.
> 120: int max_freq_bits = InvocationCounter::number_of_count_bits + 1;
Why was the type changed?
src/hotspot/share/gc/g1/g1BiasedArray.hpp line 85:
> 83: idx_t bias = (uintptr_t)bottom / mapping_granularity_in_bytes;
> 84: address base = create_new_base_array(num_target_elems, target_elem_size_in_bytes);
> 85: initialize_base(base, num_target_elems, bias, target_elem_size_in_bytes, ilog2(mapping_granularity_in_bytes));
exact_ilog2
src/hotspot/share/gc/g1/g1FreeIdSet.cpp line 47:
> 45: // 2^shift must be greater than size. Equal is not permitted, because
> 46: // size is the "end of list" value, and can be the index part of _head.
> 47: uint shift = ilog2(size) + 1;
It's not obvious that it's OK to change the type here. Could you change back so that we don't have to figure that out in this review?
src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.inline.hpp line 124:
> 122: array->oop_iterate_range(cl, 0, len);
> 123: } else {
> 124: int bits = ilog2_graceful(len);
ilog2 ? len should > ObjArrayMarkingStride*2, so must be positive.
src/hotspot/share/gc/z/zHeuristics.cpp line 48:
> 46: // Enable medium pages
> 47: ZPageSizeMedium = size;
> 48: ZPageSizeMediumShift = ilog2(ZPageSizeMedium);
exact_ilog2: Setup code is:
const size_t min = ZGranuleSize;
const size_t max = ZGranuleSize * 16;
const size_t unclamped = MaxHeapSize * 0.03125;
const size_t clamped = clamp(unclamped, min, max);
const size_t size = round_down_power_of_2(clamped);
if (size > ZPageSizeSmall) {
// Enable medium pages
ZPageSizeMedium = size;
src/hotspot/share/utilities/powerOfTwo.hpp line 59:
> 57:
> 58: // Log2 of positive, integral value, i.e., largest i such that 2^i <= value
> 59: // Returns ifZero if value is zero, defaulting to -1
Is this used anywhere? Should ifZero be if_zero?
test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 268:
> 266: #define EXPECT_EQ_ILOG2(type) \
> 267: { \
> 268: int limit = sizeof (type) * BitsPerByte; \
sizeof (type) => sizeof(type)
test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 266:
> 264: }
> 265:
> 266: #define EXPECT_EQ_ILOG2(type) \
Is there a reason why this is a macro and not a template function?
test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 277:
> 275: { \
> 276: /* Test ilog2_graceful handles 0 input */ \
> 277: type var = 1; \
Unused?
test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 374:
> 372: }
> 373: EXPECT_TRUE(value <= 15) << "value: " << value;
> 374: }
Will these micro benchmarks be removed before pushing?
-------------
PR: https://git.openjdk.java.net/jdk/pull/1663
More information about the hotspot-dev
mailing list