RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3]

Thu Sep 25 21:15:31 UTC 2025

On Tue, 26 Aug 2025 10:51:43 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.
>
>> Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.
> 
> When I looked into this earlier I thought the obvious thing needed to fix this was to reassign all the latencies so they represented a realizable pipeline delay. A proper fix would sensibly require each latency to be less than the pipeline length declared in the CPU model -- which for most arches is much less than 32. However, I didn't suggest such a rationalization because I believed (perhaps wrongly) that the latencies were also used to pick a preferred choice when we have alternative instruction/operand rule matches. The selection process involves comparing the cumulative latencies for subgraph nodes against the latency of each node defined by a match rule for the subgraph and picking the lowest latency result. After looking at some of the rules I was not sure that it would be easy to reduce all current latencies so they lie in the range 0-31 and still guarantee the current selection order. It would be even harder when the range was correctly reduced to 0 - lengthof(pipeline).
> 
> I don't even think most rule authors understand that the latencies are used by the pipeline model and instead they simply use latency as a weight to enforce orderings. That's certainly how I understood it until I ran into this issue. If so then perhaps we would be better sticking with the de facto use and fixing the shift issue with a maximum shift bound. The mask tests which rely on this shift count may help with deriving scheduling delays for some instructions with small latencies but I don't believe it is very reliable even in cases where the accumulated shifts lie within the 32 bit range. If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so.

Capping the max value was initially my preferred solution, but that was before @adinn, who has looked into this more than me, explained his reservations with that approach.  I don't think we know for sure that capping the value won't cause a change in behavior.  I think the safest solution is this:

> We could preserve the large latencies for now, and let them trigger the _maxcycleused > 32 code for more platforms.

In other words, we already have code that can handle larger bit masks and shifts, but it is not being enabled when needed because we aren't computing _maxcycleused correctly (it doesn't take fixed_latency into account).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3335962602