RFR: 8305734: BitSet.get(int, int) always returns the empty BitSet when the Integer.MAX VALUE is set

Andy-Tatman duke at openjdk.org
Fri Sep 15 14:58:50 UTC 2023


On Sat, 22 Jul 2023 04:44:21 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> See https://bugs.java.com/bugdatabase/view_bug?bug_id=8305734 and https://bugs.java.com/bugdatabase/view_bug?bug_id=JDK-8311905
>
> Hi, thanks for the background and the discussion of the alternatives. I'm not sure that drafting the CSR is the right next step; there are a number of foibles about editing the specifications that come into play when preparing the CSR. However, what you've written above is a good design discussion about the different alternatives and how they affect the specification and the implementation.
> 
> One thing that we need to consider is compatibility. From an excessively pedantic standpoint, any change in behavior is a potential incompatibility (somebody might be relying on that bug!) but I think that fixing obviously incorrect behavior is reasonable. As much as possible I'd like to make sure that things that worked, at least partially, continue to work.
> 
> With this in mind I'm leaning toward allowing a BitSet to contain bits including the Integer.MAX_VALUE bit, and adjusting various specs accordingly. I think the best way to specify this is to say that length() returns an int in the range [0, 2^31] as an unsigned value. In practice of course this means it can return any Java `int` value in the range [0, MAX_VALUE] and also Integer.MIN_VALUE. A sufficiently careful programmer can use such values correctly, as long as they avoid doing certain things such as comparing against zero. (We might want to have a note in the spec to that effect.)
> 
> It's good that you analyzed the `valueOf` methods. It looks to me like the implementation will store an actual array potentially containing bits beyond the MAX_VALUE bit, and this will affect things like length() and size() and such bits won't be accessible via get() or set(). So, on the one hand, this behavior seems clearly broken and ought to be fixed by limiting the input array along the lines suggested by your three options.
> 
> On the other hand, it seems that from looking at the code, it's possible to create an "over-size" BitSet with valueOf(), and perform bulk bit manipulations on it and other BitSets using methods like and(), or(), and xor(). It also appears possible to read out the bits successfully using toByteArray() or toLongArray(). Thus an application might be able to manipulate bit arrays of up to about 2^37 bits long by using BitSet with long arrays or LongBuffers. Restricting the input of valueOf() to 2^31 bits would break such applications.
> 
> Also note that the specification says the bits are numbered by nonnegative integers (that is, zero to 2^31) which would seem to preclude longer bit arrays. However, if somebody...

Hi @stuart-marks,

Say we allow large-array BitSets, then this would be my suggestion for methods that are still allowed for such large bitsets:
And(), AndNot(), Or(), Xor(), clone(), hashCode(), toLongArray(), and toByteArray()*. 
*: toByteArray() throws an exception if we create a bitset that is too big to fit in a byteArray, eg:

static final int MAX_WIU = Integer.MAX_VALUE/64 + 1;
long[] arr = new long[8*MAX_WIU + 1];
arr [ arr.length - 1] = 1;
BitSet broken = BitSet.valueOf(arr);
byte[] byteAr = broken.toByteArray(); // java.lang.NegativeArraySizeException: -2147483647

We could then state in the specifications of the valueOf(.. X ) methods (and clone(..)) that objects X of a certain size can only use the methods listed above. 

That being said, this concept is still risky. Say we pass a (large) bitset object to a client class / other methods with a bitset parameter, then these other classes/methods can suddenly no longer rely on this being a safe bitset object fullfilling the spec (class inv + all the method contracts). Alternatively, we could add to the specification of all other methods, how they should behave for such large bitsets. Such changes to the basic operations of the class are unlikely to be backwards compatible. Allowing large objects from valueOf(..) could just as easily break existing implementations, as these client classes would previously assume they can only deal with 'normal' bitsets, and would probably not have been made for the large-object spec.
It seems to me that this is at least as bad as banning large arrays in valueOf(..): the whole point of objects and methods with (Javadoc) specs is that you can trust that objects will always hold to these specs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13388#issuecomment-1721416793


More information about the core-libs-dev mailing list