RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v2]

Fri Aug 22 21:40:53 UTC 2025

On Fri, 22 Aug 2025 19:18:09 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

>> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.
>> 
>> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run.
>> 
>> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.
>> 
>> The problems is that shift count `n` may be too large here:
>> 
>> class Pipeline_Use_Cycle_Mask {
>> protected:
>>   uint _mask;
>>   ..
>>   Pipeline_Use_Cycle_Mask& operator<<=(int n) {
>>     _mask <<= n;
>>     return *this;
>>   }
>> };
>> 
>> The recent change attempted to cap the shift amount at one call site:
>> 
>> class Pipeline_Use_Element {
>> protected:
>>   ..
>>   // Mask of specific used cycles
>>   Pipeline_Use_Cycle_Mask _mask;
>>   ..
>>   void step(uint cycles) {
>>     _used = 0;
>>     uint max_shift = 8 * sizeof(_mask) - 1;
>>     _mask <<= (cycles < max_shift) ? cycles : max_shift;
>>   }
>> }
>> 
>> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count:
>> 
>> // The following two routines assume that the root Pipeline_Use entity
>> // consists of exactly 1 element for each functional unit
>> // start is relative to the current cycle; used for latency-based info
>> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const {
>>   for (uint i = 0; i < pred._count; i++) {
>>     const Pipeline_Use_Element *predUse = pred.element(i);
>>     if (predUse->_multiple) {
>>       uint min_delay = 7;
>>       // Multiple possible functional units, choose first unused one
>>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>>         const Pipeline_Use_Element *currUse = element(j);
>>         uint curr_delay = delay;
>>         if (predUse->_used & currUse->_used) {
>>           Pipeline_Use_Cycle_Mask x = predUse->_mask;
>>           Pipeline_Use_Cycle_Mask y = currUse->_mask;
>> 
>>           for ( y <<= curr_delay; x.overlaps(y); curr_delay++ )
>>             y <<= 1;
>>         }
>>         if (min_delay > curr_delay)
>>           min_delay = curr_delay;
>>       }
>>       if (delay < min_delay)
>>       delay = min_delay;
>>     }
>>     else {
>>       for (uint j = predUse->_lb; j <= pre...
>
> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove redundant code

I didn't realize we already had code to handle masks for large shifts.  So I think the main problem is that _maxcycleused is not being set to the max value of 100.  There is a secondary problem that we don't really need values that high, if the units are in pipeline stages.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3215730192