Request for reviews (M): 6987135: Performance regression on Intel platform with 32-bits edition between 6u13 and 6u14.

Tom Rodriguez tom.rodriguez at oracle.com
Mon Oct 25 13:15:37 PDT 2010


On Oct 22, 2010, at 6:25 PM, Vladimir Kozlov wrote:

> 
> http://cr.openjdk.java.net/~kvn/6987135/webrev.00
> 
> Fixed 6987135: Performance regression on Intel platform with 32-bits edition between 6u13 and 6u14.
> 
> Changes for 6603011 added the conversion of long
> division by constant to the code with multiply.
> But some modern cpus improved DIV instruction
> performance. Use it for long division by constant
> when it is faster than code with multiply.

The formats in x86_32.ad don't match the code.  In modL_eReg_imm32, why can't the value be 0 or -1?  Why don't you use an immL definition that ensures that?  If imm is MININT then the pcon calculation will go wrong.

I believe you could do the register declarations like this:

+ instruct modL_eReg_imm32( eADXRegL dst, eRegL src, immL32 imm, eRegI tmp, eFlagsReg cr ) %{
+   match(Set dst (ModL src imm));
+   effect(TEMP dst, TEMP tmp, KILL cr );

to leave the src and tmp unbound which would give the RA a little more freedom.  Actually wouldn't connecting src and dst directly result in fewer moves in the normal case?  You might need a new temp but it seems like there are quite a few moves of src into dst for the idivl.

+ instruct modL_eReg_imm32( eADXRegL dst, immL32 imm, eSIRegI tmp, eFlagsReg cr ) %{
+   match(Set dst (ModL dst imm));
+   effect( KILL cr );

tom

> 
> Tested on US3, T1, T2, Sparc64, AMD and Intel latest cpus.
> 



More information about the hotspot-compiler-dev mailing list