RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v3]
Boris Ulasevich
bulasevich at openjdk.org
Mon Sep 29 12:58:37 UTC 2025
On Tue, 26 Aug 2025 10:51:43 GMT, Andrew Dinn <adinn at openjdk.org> wrote:
>> Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.
>
>> Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.
>
> When I looked into this earlier I thought the obvious thing needed to fix this was to reassign all the latencies so they represented a realizable pipeline delay. A proper fix would sensibly require each latency to be less than the pipeline length declared in the CPU model -- which for most arches is much less than 32. However, I didn't suggest such a rationalization because I believed (perhaps wrongly) that the latencies were also used to pick a preferred choice when we have alternative instruction/operand rule matches. The selection process involves comparing the cumulative latencies for subgraph nodes against the latency of each node defined by a match rule for the subgraph and picking the lowest latency result. After looking at some of the rules I was not sure that it would be easy to reduce all current latencies so they lie in the range 0-31 and still guarantee the current selection order. It would be even harder when the range was correctly reduced to 0 - lengthof(pipeline).
>
> I don't even think most rule authors understand that the latencies are used by the pipeline model and instead they simply use latency as a weight to enforce orderings. That's certainly how I understood it until I ran into this issue. If so then perhaps we would be better sticking with the de facto use and fixing the shift issue with a maximum shift bound. The mask tests which rely on this shift count may help with deriving scheduling delays for some instructions with small latencies but I don't believe it is very reliable even in cases where the accumulated shifts lie within the 32 bit range. If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so.
> I'm OK with this as-is, but how about @adinn 's concern:
> > If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so
Ok, thanks!
That’s right. I tried setting more precise latencies for aarch.ad instructions per the architecture guide to see if we’d see a speedup. I did observe instruction reordering as a result of the change, but on a modern out-of-order CPU I saw no measurable performance change.
@adinn, do you think I can push this change? Is this worth deeper investigation, now or later?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3346790211
More information about the hotspot-compiler-dev
mailing list