Improving the performance of OpenJDK
Andrew Haley
aph at redhat.com
Wed Feb 18 04:37:19 PST 2009
Gary Benson wrote:
> Andrew Haley wrote:
>> Right. The whole idea of the way it's don ATM is bonkers: do a
>> byte- at-a-time unaligned load into machine order, then reverse the
>> bytes. Maybe the hope was that the compiler would see all this
>> cruft and silently convert it into an efficient form, but, er, no.
>> :-(
>
> How does it work for longer types? For a 64-bit value, for instance,
> is it better to always do 8 individual loads, or might it be better
> to try and optimize like they have done?
It depends on the frequency of execution. If the alignment is no
better than random with a uniform distribution, then half the time
you'll be looking at an address that is not aligned for any type
larger than a byte. If so, there's no point checking for special cases.
It is, however, worth avoiding 64-bit operations on 32-bit platforms,
so this is probably the best way to do a 64-bit big-endian load, at
least on gcc:
unsigned long long foo6 (unsigned char *p)
{
unsigned long u1;
u1 = ((unsigned long)p[0] << 24
| (unsigned long)p[1] << 16
| (unsigned long)p[2] << 8
| p[3]);
unsigned long u2;
u2 = ((unsigned long)p[4] << 24
| (unsigned long)p[5] << 16
| (unsigned long)p[6] << 8
| p[7]);
return ((unsigned long long)u1<<32 | u2);
}
The code generated here may be significantly better on 32-bit platforms
than the equivalent that uses unsigned long long, and not significantly
worse on 64-bit platforms.
Andrew.
More information about the zero-dev
mailing list