RFR: 8305734: BitSet.get(int, int) always returns the empty BitSet when the Integer.MAX VALUE is set

Andy-Tatman duke at openjdk.org
Wed Jul 19 13:40:50 UTC 2023


On Fri, 14 Apr 2023 15:53:41 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> @AlanBateman 
>> It is a known issue that size() may return a negative integer, see [JDK-8230557](https://bugs.openjdk.org/browse/JDK-8230557), and the accepted workaround is to interpret the returned integer as an unsigned value and convert the output to a long. This same workaround works if a user would call length() with Integer.MAX_VALUE set. Changing the specification to reject setting Integer.MAX_VALUE may end up breaking the implementation of clients who rely on setting Integer.MAX_VALUE and use this workaround.
>> 
>> Furthermore, the other methods (including ones that use length()) still function correctly whether or not the Integer.MAX_VALUE bit is set, except for get(int,int) as reported here. For example, clear(int, int) works as expected if Integer.MAX_VALUE is set as length() then is not called.
>> Changing the specification to reject setting Integer.MAX_VALUE may break user code that use this bit and/or users that rely on the above workaround.
>> 
>> So while changing the specifications is possible, it can potentially break existing clients. The change suggested in this pull request avoids this and instead fixes the internal bug of the get function locally, without affecting the other methods and without affecting existing clients.
>
>> So while changing the specifications is possible, it can potentially break existing clients. The change suggested in this pull request avoids this and instead fixes the internal bug of the get function locally, without affecting the other methods and without affecting existing clients.
> 
> I think it will require re-visting the spec, maybe deprecating and/or introducing new methods, it's unfortunate that this wasn't recognised in JDK-8230557.
> 
> Update: @stuart-marks has added a comment to JDK-8230557 on the workaround that someone added to that issue in 2019.

Hi everyone @AlanBateman @liach @stuart-marks  ,

Based on the article's two main options (either banning or still allowing the Integer.MAX_VALUE bit), I have prepared two different CSRs.
These address the changes in specification that are required for each choice, as well as the change in implementation needed in `valueOf(..)`.
I have lumped Solution and Specification together here. Once we decide what we want to actually submit as the CSR, then we can look at making the proper git patch files.

### Permit Integer.MAX_VALUE
**Summary**
While still permitting setting the Integer.MAX VALUE bit in a BitSet, change the specification of `length()` to clarify that it can return Integer.MIN VALUE. 
For the `valueOf(..)` methods, change the specification to prevent the internal value of wordsInUse in BitSet from overflowing and to prevent the bitset storing bits it cannot access

**Problem**
When the Integer.MAX_VALUE bit is set in a BitSet, `length()` returns Integer.MAX_VALUE+1 which overflows to Integer.MIN_VALUE. The specification of the method does not make this clear: The "logical size" of the BitSet is not logically a negative number. 
The `valueOf(..)` methods allow any long[], LongBuffer, byte[] or ByteBuffer to make a bitset. In doing this, wordsInUse gets set to the largest element of words in the new BitSet with at least 1 bit set. However, the internal BitSet code relies on wordsInUse <= Integer.MAX_VALUE / 64 + 1 (we abbreviate this to a constant MAX_WIU), and the bitset class cannot access bits stored in words[MAX_WIU] and above.  
If this is not the case, then the return value of `length()` is no longer reliable: it is determined by the element stored in the long[] or LongBuffer or ..., but these are bits that are not accessible by the BitSet itself. 
Technically, the expression BITS_PER_WORD * (wordsInUse - 1) will overflow, and `length()` will refer to a bit (with negative index or with incorrect positive index) that is not accessible by the other methods of the BitSet class. 

**Solution/Specification** 
`length()`: Specify that the method can return Integer.MIN_VALUE, and this can be interpreted as the UNsigned integer Integer.MAX_VALUE+1.
`valueOf(..)`: There are multiple options. We use `valueOf(long[] longs)` as an example, but equivalent would apply to the other 3 `valueOf(..)` methods.
Option 1: Raise an exception if longs.length is greater than MAX_WIU.
Option 2: Raise an exception if longs.length is greater than MAX_WIU AND there is at least one non-zero element in longs[MAX_WIU]-longs[ length-1 ]
Option 3: No exception, but only take the first MAX_WIU of longs for the bitset. 

**Risk**
Low.
The change in the `length()` specification matches existing behaviour.
The change in the `valueOf(..)` only involves exceptionally large objects. If these were used regularly, it is likely that the `valueOf(..)` bug would have been discovered earlier. 


### Forbid Int.MAX
*For the `valueOf(..)` methods, the same applies here as discussed in the other CSR. The only difference is mentioned in the Solution/Specification section.

**Summary**
Forbid interacting with the Integer.MAX_VALUE bit, preventing an overflow from occurring in `length()`. This involves single-parameter methods in the BitSet class raising an exception if the class tries to access Integer.MAX_VALUE.


**Problem**
When the Integer.MAX_VALUE bit is set, length() overflows to Integer.MIN_VALUE (Integer.MAX_VALUE+1). Moreover, a method such as `set(int bitIndex)` can set the Integer.MAX_VALUE bit, while `set(int fromIndex, int toIndex)` cannot, because the method sets up to but not including toIndex. 


**Solution/Specification** 
Ban accessing the Integer.MAX_VALUE bit. This includes the following changes to the specification (and implementation):
`set(int)`, `set(int, bool)`, `clear(int)`, `get(int)`, `flip(int)`: raise an exception if bitIndex = Integer.MAX_VALUE.

`valueOf(..)`: In addition to the changes described in the **'Permit Integer.MAX_VALUE'** CSR, the `valueOf(..)` methods need to raise an exception if the Integer.MAX_VALUE bit is set (Option 1 and 2) or ignore the bit stored at index Integer.MAX_VALUE, and ensure it is 0 (option 3).


**Risk**
Medium / high?
Forbidding access to the Integer.MAX_VALUE bit is a change to the specification of the essential methods of BitSet, and potentially affects actual, existing use cases. 
This does however require people to be using Integer.MAX_VALUE bits, which is an exceptionally large amount of bits.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13388#issuecomment-1642101764


More information about the core-libs-dev mailing list