RFR: 8155043: BitMap set operations assume clear bits beyond unaligned end
Stefan Karlsson
stefan.karlsson at oracle.com
Tue Aug 9 08:31:42 UTC 2016
Hi Kim,
On 2016-08-08 20:56, Kim Barrett wrote:
>> On Aug 5, 2016, at 9:27 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> Hi Kim,
>>
>> This looks mostly good to me. I have a few comments inlined:
> Thanks for looking at this.
>
> New webrevs:
> full: http://cr.openjdk.java.net/~kbarrett/8155043/webrev.01/
> incr: http://cr.openjdk.java.net/~kbarrett/8155043/webrev.01.inc/
Reviewed.
I still want to respond to some of the comments below:
>
> Responses inline.
>
>> On 2016-08-05 00:02, Kim Barrett wrote:
>>> Please review this change to the BitMap set operations to avoid
>>> depending on or modifying bits beyond the size of the BitMap when the
>>> size is not word aligned. This change also adds some unit tests for
>>> these operations.
>>>
>>> In addition:
>>>
>>> - Changed set_from to use Copy::disjoint_words. For other set
>>> operations, use a consistent iteration idiom for all.
>>>
>>> - In the set_xxx_with_result operations, eliminated conditionals in
>>> the accumulation of the result.
>> I guess this was done for performance reasons? Have you measured and seen any gains with this? The generated code becomes significantly larger when using the or/xor combination in your patch.
> It was supposed to be an optimization. Even if there is some register
> usage restriction on x86[_64] that inhibits the obvious (though small)
> improvement, the result should be no worse there, and should still be
> better on more regular architectures.
>
> Unfortunately, with the new change accumulation, gcc4.9 on x86_64
> *completely* loses its way, and instead generates very strange code.
> Yuck! The inner loop appears to be not obviously better or worse than
> with the old change accumulation, but there's a huge amount of
> additional setup and teardown code around that inner loop.
>
> So backing that out. Thanks for noticing that.
>
>>> - Removed unused BitMap::set_intersection_at_offset, rather than
>>> writing tests for it.
>>>
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8155043
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~kbarrett/8155043/webrev.00/
>> Some minor comments/suggestions that you might want to consider:
>>
>> ==========
>> http://cr.openjdk.java.net/~kbarrett/8155043/webrev.00/src/share/vm/utilities/bitMap.cpp.frames.html
>>
>> There are a few places where the tail calculations are slightly different compared to the calculations of the aligned part. Would it be good to use the same bitwise operations, just to get consistency between the two parts?
>>
>> For example, in BitMap::contains we have
>>
>> 408 bm_word_t value = dest_map[index];
>> 409 if (value != (value | other_map[index])) return false;
>>
>> vs
>>
>> 414 return (rest == 0) || tail_of_set(~dest_map[limit] & other_map[limit], rest) == 0;
>>
>> We could replace 408-409 with 'if (~dest_map[limit] & other_map[limit]) == 0) return false’.
> That one was a "mistake"; thanks for spotting it.
>
> Also regularized is_full.
>
>> ----------
>> I find that the introduction of the 'bm_word_t orig' variable just makes the code extra noisy, and would have preferred the calculations without it.
> I find the repeated use of "dest_map[index]" noisy and harder to read,
> and prefer to keep the orig variable.
Sure, keep it. I just want to point out that the added variable didn't
help the readability for me. It just forced med to have to do the
resolve orig = dest_map[index] in my head each time I saw it in the bit
fiddling calculations.
> One place I kept the repetition
> was in the assignment, since making orig a reference, e.g.
>
> bm_word_t& orig = ...; ... uses orig ...; orig = ...;
>
> seemed very unlike Hotspot style.
>
>> ----------
>>
>> The new code has a few functions named *_of_set. The terminology "set" is previously not used in the BitMap file, so I'm a bit reluctant to introduce it. Could we get rid of it by renaming the functions:
>> tail_of_set -> tail_of_map
>> merge_tail_of_set -> merge_tail_of_map
>>
>> or maybe:
>> tail_of_set -> tail_bits
>> merge_tail_of_set -> merge_tail_bits
> I like tail_of_set => tail_of_map. Done.
>
>> ==========
>> http://cr.openjdk.java.net/~kbarrett/8155043/webrev.00/test/native/utilities/test_bitMap_setops.cpp.html
>>
>> The file tests more than the set operations, so maybe the _setops suffix should be dropped?
> I think the only non-setop is set_from, and it's here because it has
> the same considerations.
I started to write a response stating that the functions is_same,
is_full, is_empty, contains, and intersects are not set_<operation>
functions, but now I realize that you mean "set [theory] operations" not
"operations that set the value of 'this'". So, I guess it's fine.
Thanks,
StefanK
> Full coverage of BitMap requires far more
> than what's here, and I don't think a single monolithic test file is
> necessary, or even good in a case like this class that has a lot of
> functionality.
> For example, I have a new test_bitMap_search.cpp
> (testing iterate and get_next_one_offset & friends) in an mq patch
> waiting for submission.
>> ----------
>> Some of the tests restore the previous bitmap value after all checks, but some don't. It would probably be good to make that more consistent.
> I've added WithBit{Set,Clear} scoped helpers and use them throughout.
> Also fixed some inconsistencies in use of ASSERT_xxx.
>
>> ----------
>> class BitMapMemory {
>> ...
>> private:
>> idx_t _words;
>> bm_word_t* _memory;
>> };
>>
>> Most of the classes in the HotSpot have the members at the top of the class.
> It seems I've just been continually surprised and mildly irritated by
> a lot of our class layouts, without twigging to it being a consistent
> "style". Now that you've mentioned it, I looked around and found
> that, while there is the occasional counter-example, that order is by
> far dominant.
>
> I'm surprised because it is contrary to every C++ style guide / best
> practice recommendation I've seen. [Variable declarations first seems
> to be common Java style, but this is C++ code.] Irritated because I
> agree with the rationale for those recommendations, e.g. usually one
> is looking at a class definition as a client interested in its public
> API, and having to scan / scroll down past private implementation
> details is wasting time in that case. Oh, well, when in Rome...
>
>> ----------
>> static bm_word_t make_even_bits() {
>> bm_word_t result = 1;
>> while (true) {
>> bm_word_t next = (result << 2) | 1;
>> if (next == result) return result;
>> result = next;
>> }
>> }
>>
>> I'd prefer if you moved down the return statement to a separate line and add braces.
> Done.
>
More information about the hotspot-dev
mailing list