RFR: 8213415: BitMap::word_index_round_up overflow problems

Fri Nov 29 08:03:40 UTC 2019

Hi Kim,

On Thu, Nov 28, 2019 at 11:01 PM Kim Barrett <kim.barrett at oracle.com> wrote:

> > On Nov 28, 2019, at 7:42 AM, Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
> >
> > Hi Kim,
> >
> >
> http://cr.openjdk.java.net/~kbarrett/8213415/open.03/src/hotspot/share/utilities/bitMap.hpp.udiff.html
> >
> > +  // Limit max_size_in_bits so aligning up to a word never overflows.
> > +  static idx_t max_size_in_words() { return
> raw_to_words_align_down(~idx_t(0)); }
> > +  static idx_t max_size_in_bits() { return max_size_in_words() *
> BitsPerWord; }
> >
> > Could we have better comments? I first thought that max_size_in_words()
> means bitmap size, only the static specifier ticked me off.
>
> I'm not sure I understand your issue.  I modified the comment a bit to
> address what I think you are saying, but I'm not sure I've covered it.
>
>
I meant that you use "size" with different meanings - throughout most of
the class size is the size, in bits, of the BitMap object (_size). Here it
is something different, not sure yet what exactly, see below.

> > So, if I understand this correctly, we need this since we use the same
> type for bit and word indices and since we have a n:1 relationship between
> those two the max. bit index is necessary smaller than <type>_MAX?
>
> The point is to avoid overflow of the type used for bit indices when
> aligning a value up to a multiple of the word size. This doesn't
> really have anything to do with using the same types for bit indices
> and word indices, though using different types might affect the
> details of some of the calculations, and the range for the word type
> would need to be suitably chosen to accomodate the bit range.
>
>
I still in the dark. In your current version max_size_in_words() and
max_size_in_bits() there is an overflow, since both bit- and word indexes
use the same type. With 64bit I come to: FFFFFFFF.FFFFFFC0 for max word
index, 3FFFFFF.FFFFFFFF for max bit index. For 64bit types this does not
matter much, but if we ever were to use smaller types, e.g. uint16_t, it
would matter. Also, I find it surprising that max bit index is smaller than
max word index.

Side note, I was interested in using smaller types because long term I
would like to have a BitMap class in cases where I today use little hand
written bitmaps. As it is now, BitMap has a pointer and a size, which makes
it a 16byte structure on 64 bit, which is rather fat. The indirection is
also often unwanted. I would like to have a BitMap class which contains
directly the data as member(s), e.g. one where it just has a 16bit word or,
maybe, an array of multiple words. That would make this structure a lot
smaller and better suited to be included in space sensitive structures.

> +  // Assumes relevant validity checking for bit has already been done.
> > +  static idx_t raw_to_words_align_up(idx_t bit) {
> > +    return raw_to_words_align_down(bit + (BitsPerWord - 1));
> > +  }
> >
> > Interestingly, this could break should we ever have different types for
> word- and bit indices. Should we ever want to templatize the underlying
> types, eg. to support small bitmaps where 8 or 16 bits would be enough to
> encode a bitmap index. Then it could make sense to have a larger type for
> word indices than for bit indices and we could overflow here.
>
> I don't understand why one would need a larger type for word indices?
> The range for word indices is necessarily smaller than the range for
> bit indices.
>
> > (Just idle musings.. we have talked about using different types for bit-
> and word-indexes before. I have worked on a patch for this, but it got
> snowed under by other work. See:
> http://cr.openjdk.java.net/~stuefe/webrevs/bitmap-improvements/bitmap-better-types.
> What do you think, does this still make sense?)
>
> Making the types different obviously has some benefits for type
> safety, if implemented in a way that actually provides a distinction.
> Just using different typedefs for the same underlying type doesn't
> accomplish that though. I think a strong type for words could be
> hidden in the private implementation, and have thought about doing
> something like that, but not for this change.
>
>
Oh sure. This was just the first draft. My idea was - if only for test
reasons - to use a class type which wraps around a numeric and defines +/-
operations and assignments. I wonder though whether there is a simpler way
to make the compiler complain about assignments between word- and bit
indices.

But even with the same underlying typedef, using different types for word-
and bit indexes would make the code more readable and clearer.

> > +
> > +  // Verify size_in_bits does not exceed maximum size.
> > +  static void verify_size(idx_t size_in_bits) NOT_DEBUG_RETURN;
> >
> > "maximum size" is unclear. Can you make the comment more precise?
>
> I modified the comments for the verify functions to be more explicit.
>
>
Thank you

> > Also, could this function be private?
>
> I put the new verify functions near the pre-existing ones. There are a
> lot of protected functions in this class that seem like they ought to
> be private. I don't want to address that in this change.
>
> > +  // Verify bit is less than size.
> > +  void verify_index(idx_t bit) const NOT_DEBUG_RETURN;
> > +  // Verify bit is not greater than size.
> > +  void verify_limit(idx_t bit) const NOT_DEBUG_RETURN;
> > +  // Verify [beg,end) is a valid range, e.g. beg <= end <= size().
> > +  void verify_range(idx_t beg, idx_t end) const NOT_DEBUG_RETURN;
> >
> > I have a bit of a trouble understanding these variants and how they are
> used.
> >
> > I believe to understand that verify_limit() is to verify range sizes,
> since the valid range for a size type is always one larger than that for an
> index type, right? But it is used to test indices for validity, for example
> via to_words_align_down() -> word_addr(). Which would mean for a 64bit
> sized bitmap (1 word size) I can feed an invalid "64" as index to
> word_addr(), which verify_limit() would accept but would get me the invalid
> word index "1". I would have expected word_addr() to only return usable
> word adresses and to assert in this case.
>
> verify_limit checks that the argument is a valid index or
> one-past-the-last designator, e.g. it is a valid iteration limit.
> "limit" was the preferred term from discussion with tschatzl.
>
>
Okay, with "iteration limit" in mind I understand the naming better.

> There are uses of word_addr() to obtain a one-past-the-last pointer.
>
> > I also feel that if I got it right the naming is the wrong way around. I
> would have expected "verify_size()" to refer to the bitmap object map size,
> and "verify_limit()" to the type limit (similar to limits.h)..
>
> verify_size is about checking the validity of a value that will be
> used as the size of a bitmap.  Maybe the comment changes help?
>
> New webrevs:
> full: https://cr.openjdk.java.net/~kbarrett/8213415/open.04/
> incr: https://cr.openjdk.java.net/~kbarrett/8213415/open.04.inc/
>
>
>
The comments look better, thanks.

Cheers, Thomas