A question about bytecodes
Stephen Dawkins
elfarto+hs at elfarto.com
Sun Jan 11 03:03:43 PST 2009
John Rose wrote:
> On Jan 10, 2009, at 1:25 PM, Stephen Dawkins wrote:
>
>> ...because we're stuck with a limited set of bytecodes that can't
>> adequately express what the programmer was intending? It seems silly
>> that we're making do with with this limited bytecode language and
>> losing so much information the programmer could provide for us,
>> allowing the compiler/optimiser todo a much better job, without having
>> to guess that:
>>
>> float a[] = new float[4], b[] = new float[4];
>> float c = a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
>>
>> could be compiled down to:
>>
>> DPPS xmm1, xmm2
>
<snip>
> Given tuple types in Java, it would look much better.
>
> struct VectorF4 { // signature {Ljava/math/VectorF4;FFFF}
> float a0, a1, a2, a3;
> }
>
> // declaration in java.math.VectorMath
> VectorF4 dpps(VectorF4 a, VectorF4 b);
>
> // example Java
> VectorF4 c = VectorMath.dpps(a, b);
>
> // example bytecodes
> fload 10; fload 11; fload 12; fload 13
> fload 20; fload 21; fload 22; fload 23
> invokestatic java/math/VectorMath, dpps,
> ({Ljava/math/VectorF4;FFFF}{Ljava/math/VectorF4;FFFF}){Ljava/math/VectorF4;FFFF}
>
> fstore 33; fstore 32; fstore 31; fstore 30
>
> // compiler graph after intrinsic application would be something like
> VdppsF4(ConvF4(10,11,12,13), ConvF4(20,21,22,23))
>
> -- John
>
I'm not sure that would be good enough. One issue with SSE is ensuring
that your 128-bit fields are aligned to 16-bytes so you can move all 4
fields in one go. E.g.
public class SpaceShip {
private Vector4f position;
private Vector4f velocity;
}
How does the VM know to align position and velocity to 16-bytes? It
might seem contrived, but unaligned reads (MOVUPS) are not as fast as
aligned ones (MOVAPS) and since the whole goal of getting SIMD
instructions into Java would be performance we might as well aim for the
fastest possible.
Also it seems with these tuples, you lose the information about where
each field comes from. Your example is having to recreate the vector
from 4 individual fields, when it could have just copied it wholesale
and been done with it.
Regards
Stephen
More information about the hotspot-dev
mailing list