Request for reviews (M): 6987135: Performance regression on Intel platform with 32-bits edition between 6u13 and 6u14.
Tom Rodriguez
tom.rodriguez at oracle.com
Mon Oct 25 13:15:37 PDT 2010
On Oct 22, 2010, at 6:25 PM, Vladimir Kozlov wrote:
>
> http://cr.openjdk.java.net/~kvn/6987135/webrev.00
>
> Fixed 6987135: Performance regression on Intel platform with 32-bits edition between 6u13 and 6u14.
>
> Changes for 6603011 added the conversion of long
> division by constant to the code with multiply.
> But some modern cpus improved DIV instruction
> performance. Use it for long division by constant
> when it is faster than code with multiply.
The formats in x86_32.ad don't match the code. In modL_eReg_imm32, why can't the value be 0 or -1? Why don't you use an immL definition that ensures that? If imm is MININT then the pcon calculation will go wrong.
I believe you could do the register declarations like this:
+ instruct modL_eReg_imm32( eADXRegL dst, eRegL src, immL32 imm, eRegI tmp, eFlagsReg cr ) %{
+ match(Set dst (ModL src imm));
+ effect(TEMP dst, TEMP tmp, KILL cr );
to leave the src and tmp unbound which would give the RA a little more freedom. Actually wouldn't connecting src and dst directly result in fewer moves in the normal case? You might need a new temp but it seems like there are quite a few moves of src into dst for the idivl.
+ instruct modL_eReg_imm32( eADXRegL dst, immL32 imm, eSIRegI tmp, eFlagsReg cr ) %{
+ match(Set dst (ModL dst imm));
+ effect( KILL cr );
tom
>
> Tested on US3, T1, T2, Sparc64, AMD and Intel latest cpus.
>
More information about the hotspot-compiler-dev
mailing list