RFR: 8257815: Replace global log2 functions with efficient implementations [v6]

Stefan Karlsson stefank at openjdk.java.net
Wed Dec 9 08:56:42 UTC 2020


On Tue, 8 Dec 2020 19:32:01 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> This patch replaces the log2 functions in globalDefinitions.hpp with more efficient counterparts in utilities/powerOfTwo.hpp
>> 
>> Naming is hard, but I think the following scheme is reasonable:
>> 
>> - log2i: any integral type. 0-hostile
>> - log2i_allow_zero: any integral type. gracefully handles zero (adds a branch)
>> - exact_log2i: any integral type. value must be a power of two
>> 
>> I chose log2i rather than log2 to stand out from the log2 functions defined in various standard libraries.
>> 
>> Going through all usage, quite a few uses of log2_long et.c. could be replaced by exact_log2i since they take something that has been checked to be a power of two. Most of the remaining usage seem to be able to use the 0-hostile variant, which avoids a branch.
>> 
>> To sanity check that calculating log2 using count_leading_zeros gives better performance I added a couple of trivial and short-running microbenchmarks to test_powerOfTwo. For small values (<= 1025) the new impl is ~5x faster, with a larger speed-up for larger integer values:
>> 
>> [ RUN      ] power_of_2.log2_long_micro
>> [       OK ] power_of_2.log2_long_micro (3581 ms)
>> [ RUN      ] power_of_2.log2_long_small_micro
>> [       OK ] power_of_2.log2_long_small_micro (549 ms)
>> [ RUN      ] power_of_2.log2i_micro
>> [       OK ] power_of_2.log2i_micro (259 ms)
>> [ RUN      ] power_of_2.log2i_small_micro
>> [       OK ] power_of_2.log2i_small_micro (113 ms)
>> 
>> I'm not sure if this naive microbenchmark carries its own weight, but it just adds a few seconds and can be useful for quickly checking this performance assumption on other H/W
>> 
>> (Intending this for 17)
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   x->value

src/hotspot/cpu/ppc/ppc.ad line 9078:

> 9076:     long src2      = $src2$$constant;
> 9077:     long src3      = $src3$$constant;
> 9078:     long maskbits  = src3 + exact_ilog2((jlong) (julong) (juint) -src2);

I can't verify that this is correct. Someone else has to review this and similar changes below.

src/hotspot/cpu/x86/gc/z/zGlobals_x86.cpp line 142:

> 140:   const size_t max_address_offset_bits = 44; // 16TB
> 141:   const size_t address_offset = round_up_power_of_2(MaxHeapSize * ZVirtualToPhysicalRatio);
> 142:   const size_t address_offset_bits = ilog2(address_offset);

exact_ilog2?

src/hotspot/share/compiler/compilerDefinitions.cpp line 120:

> 118:   // one bit shorter then the value of the notification frequency. Set
> 119:   // max_freq_bits accordingly.
> 120:   int max_freq_bits = InvocationCounter::number_of_count_bits + 1;

Why was the type changed?

src/hotspot/share/gc/g1/g1BiasedArray.hpp line 85:

> 83:     idx_t bias = (uintptr_t)bottom / mapping_granularity_in_bytes;
> 84:     address base = create_new_base_array(num_target_elems, target_elem_size_in_bytes);
> 85:     initialize_base(base, num_target_elems, bias, target_elem_size_in_bytes, ilog2(mapping_granularity_in_bytes));

exact_ilog2

src/hotspot/share/gc/g1/g1FreeIdSet.cpp line 47:

> 45:   // 2^shift must be greater than size. Equal is not permitted, because
> 46:   // size is the "end of list" value, and can be the index part of _head.
> 47:   uint shift = ilog2(size) + 1;

It's not obvious that it's OK to change the type here. Could you change back so that we don't have to figure that out in this review?

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.inline.hpp line 124:

> 122:     array->oop_iterate_range(cl, 0, len);
> 123:   } else {
> 124:     int bits = ilog2_graceful(len);

ilog2 ? len should > ObjArrayMarkingStride*2, so must be positive.

src/hotspot/share/gc/z/zHeuristics.cpp line 48:

> 46:     // Enable medium pages
> 47:     ZPageSizeMedium             = size;
> 48:     ZPageSizeMediumShift        = ilog2(ZPageSizeMedium);

exact_ilog2: Setup code is:
  const size_t min = ZGranuleSize;
  const size_t max = ZGranuleSize * 16;
  const size_t unclamped = MaxHeapSize * 0.03125;
  const size_t clamped = clamp(unclamped, min, max);
  const size_t size = round_down_power_of_2(clamped);

  if (size > ZPageSizeSmall) {
    // Enable medium pages
    ZPageSizeMedium             = size;

src/hotspot/share/utilities/powerOfTwo.hpp line 59:

> 57: 
> 58: // Log2 of positive, integral value, i.e., largest i such that 2^i <= value
> 59: // Returns ifZero if value is zero, defaulting to -1

Is this used anywhere? Should ifZero be if_zero?

test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 268:

> 266: #define EXPECT_EQ_ILOG2(type)                                      \
> 267: {                                                                  \
> 268:   int limit = sizeof (type) * BitsPerByte;                         \

sizeof (type) => sizeof(type)

test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 266:

> 264: }
> 265: 
> 266: #define EXPECT_EQ_ILOG2(type)                                      \

Is there a reason why this is a macro and not a template function?

test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 277:

> 275:   {                                                                \
> 276:     /* Test ilog2_graceful handles 0 input */                      \
> 277:     type var = 1;                                                  \

Unused?

test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 374:

> 372:   }
> 373:   EXPECT_TRUE(value <= 15) << "value: " << value;
> 374: }

Will these micro benchmarks be removed before pushing?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1663


More information about the hotspot-dev mailing list