RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3]

Mon Aug 25 14:17:14 UTC 2025

> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.
> 
> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run.
> 
> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.
> 
> The problems is that shift count `n` may be too large here:
> 
> class Pipeline_Use_Cycle_Mask {
> protected:
>   uint _mask;
>   ..
>   Pipeline_Use_Cycle_Mask& operator<<=(int n) {
>     _mask <<= n;
>     return *this;
>   }
> };
> 
> The recent change attempted to cap the shift amount at one call site:
> 
> class Pipeline_Use_Element {
> protected:
>   ..
>   // Mask of specific used cycles
>   Pipeline_Use_Cycle_Mask _mask;
>   ..
>   void step(uint cycles) {
>     _used = 0;
>     uint max_shift = 8 * sizeof(_mask) - 1;
>     _mask <<= (cycles < max_shift) ? cycles : max_shift;
>   }
> }
> 
> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count:
> 
> // The following two routines assume that the root Pipeline_Use entity
> // consists of exactly 1 element for each functional unit
> // start is relative to the current cycle; used for latency-based info
> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const {
>   for (uint i = 0; i < pred._count; i++) {
>     const Pipeline_Use_Element *predUse = pred.element(i);
>     if (predUse->_multiple) {
>       uint min_delay = 7;
>       // Multiple possible functional units, choose first unused one
>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>         const Pipeline_Use_Element *currUse = element(j);
>         uint curr_delay = delay;
>         if (predUse->_used & currUse->_used) {
>           Pipeline_Use_Cycle_Mask x = predUse->_mask;
>           Pipeline_Use_Cycle_Mask y = currUse->_mask;
> 
>           for ( y <<= curr_delay; x.overlaps(y); curr_delay++ )
>             y <<= 1;
>         }
>         if (min_delay > curr_delay)
>           min_delay = curr_delay;
>       }
>       if (delay < min_delay)
>       delay = min_delay;
>     }
>     else {
>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>         const Pipeline_Use_Element *currUse = element(j);
>         if (predUse->_used & currUse->_used) {
>  ...

Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision:

  use uint32_t for _mask

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26890/files
  - new: https://git.openjdk.org/jdk/pull/26890/files/389a9dab..e3ac8703

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=01-02

  Stats: 11 lines in 1 file changed: 0 ins; 1 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/26890.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890

PR: https://git.openjdk.org/jdk/pull/26890