A question about bytecodes + unsigned load performance ./. add performace
John Rose
John.Rose at Sun.COM
Tue Jan 20 00:13:27 PST 2009
On Jan 19, 2009, at 9:37 AM, Christian Thalinger wrote:
> Maybe someone can explain to me why the generated code is so
> different?
You have observed a common traffic jam of loads at the top of an
unrolled loop. That happens because, when generating a CFG from the
sea of nodes, we allow load operations to float upward in the graph
(toward earlier points). That means they all get scheduled at the
top of the loop body. This is valid, and it usually makes sense on
machines with lots of registers, but it is kind of dumb when the
number of loaded values is larger than the number of available temp
registers. But we can't measure register pressure at the point the
CFG is built. It's the typical NP-hard chicken-and-egg problem, of
needing a CFG to build a register allocation, but needing a register
allocation to make a good CFG assignment. Years ago we decided to
led the loads float up, and assume that modern hardware would supply
the temp. registers and/or stack caches. It would be more graceful
to throttle the load placement a little more (e.g., to try to reduce
load-to-last-use distance). But this seems to be a medium-large
project, and there have always been more pressing matters to deal with.
I can't say off-hand why the x+0x80 version schedules the loads
better. The +0x80 part probably gets subsumed into the index ranges
which are split in complicated ways (as I mentioned earlier). I have
no idea why the signed loads (feeding the +0x80 ops) would trickle
downward in the graph compared with the unsigned loads bouncing up to
the loop top. This might be a case where PrintOptoAssembly would
supply a missing clue; the ideal graph visualizer would be even
better. Maybe there are "invisible" (non-coding) nodes which
participate in the CFG and accidentally prevent the signed loads from
rising. Or maybe the heuristics are just rolling opposite ways on
similar but slightly different inputs; this also is typical of NP-
hard algorithms.
-- John
More information about the hotspot-compiler-dev
mailing list