A question about bytecodes + unsigned load performance ./. add performace
John Rose
John.Rose at Sun.COM
Fri Jan 9 17:05:13 PST 2009
On Jan 9, 2009, at 3:44 PM, Ulf Zibis wrote:
> Am 10.01.2009 00:17, John Rose schrieb:
>>
>> Note that compilers tend to optimize expressions like myByteArray
>> [i] & 0xFF into unsigned loads, and packaging this into an
>> intrinsic method would add predicability of compilation (if
>> anybody cares), and the case is not frequent enough to warrant
>> shaving a few bytes off the instruction format.
>>
> ... but myByte + 0x80 is faster than myByte & 0xFF. For me this is
> an unintelligible mystery.
> source see here (line 141..144):
> http://hg.openjdk.java.net/jdk7/tl/jdk/file/b89ba9a6d9a6/make/
> tools/src/build/tools/charsetmapping/GenerateSBCS.java
>
> How can adding be faster than unsigned load of a byte?
Probably because (1) the bias of 128 can be folded into the address
mode of the instruction which implements cc[byte + 128], and (2) the
the 0xFF mask would need to be applied immediately when the byte is
loaded from memory, in order for the optimizer to fold (AndI 255
(LoadB p)) to (LoadUB p). If the AndI gets separated from the LoadB
(perhaps via intervening Phi functions) then the optimizer cannot do
the peepholing. But if the AndI/LoadB expression is placed in a
getUnsigned intrinsic m method, then the compiler will be able to see
both operations within the same small "window".
(Regarding (1) you might ask why the +128 does not slow down the
range check. That gets into details of range check elimination, but
the short answer is that the bias of 128 gets folded into the RCE
optimizations also. A little more detail: RCE works by iteration
space splitting, into pre/main/post loops usually, and the
calculations which govern those are add/subtract/compare/min/max
nests, into which the +128 merges nicely without adding much overhead.)
Thanks for pointing out the +128 trick. That's a good one! I wish
the JIT could do it automagically, but I don't see how, since it
probably requires swapping the halves of a table structures, and our
optimizer is not nearly that heroic.
-- John
More information about the hotspot-dev
mailing list