Why doesn't HotSpot use div machine code?

Chuck Rasbold rasbold at google.com
Wed Dec 23 15:28:28 PST 2009


Replacing divide/modulus by constant with is a very common optimization in
many compilers.  Even without the excellent AMD Optimization guide citation,
I would guess that two multiplies are usually faster than a one divide, and
the arithmetic/logic instructions are mostly noise.

Osvaldo is right, the downside of the transformation is increased register
pressure, and HotSpot will eagerly perform this transformation without
regard to that.

However, when the divisor is non-constant, on x86 processors, HotSpot is
capable of combining the the / and %  operations with like operands into a
single div operation.

-- Chuck

On Wed, Dec 23, 2009 at 2:51 PM, Osvaldo Doederlein <opinali at gmail.com>wrote:

> Hi,
>
> Perhaps because of this:
>
> http://support.amd.com/us/Processor_TechDocs/40546-PUB-Optguide_3-11_5-21-09.pdf
>
> imul's latencies are tiny (3 cycles for both forms used in the code), but
> div/idiv's are enormous (check Table 7). These numbers are for a specific
> CPU family but I don't expect this to be very different in other CPUs. The
> code produce by HotSpot will probably win, even with the extra shifts, movs
> etc.
>
> OTOH, I wonder if HotSpot would be capable to produce your desired code if
> it was faster - it consumes less registers, and that's also very important
> remarkably in x86.
>
> A+
> Osvaldo
>
> 2009/12/23 Ulf Zibis <Ulf.Zibis at gmx.de>
>
> In my code I have a method similar to the following:
>> (divide char value by 8-bit constant and combine it's lower 8-bit quotient
>> and remainder to a new char value)
>>
>>    static final byte BYTE_RANGE = 0x5e;
>>    static char db(char db) {
>>       return (char)((((db / (BYTE_RANGE&0xff) & 0xff) << 8) | (db %
>> (BYTE_RANGE&0xff) & 0xff)) // force DIV word/byte
>>               + ...;
>>   }
>>
>> This could be compiled to:
>>
>> mov    %cx,%ax    ; copy char db to ax register
>> div    $0x5e
>> xchg   %al,%ah
>>
>> ... but disassembly output results:
>> (some sophisticated trick using 2 imul instructions)
>>
>>  0x00ba4f67: mov    $0xae4c415d,%eax
>>  0x00ba4f6c: imul   %ecx
>>  0x00ba4f6e: add    %ecx,%edx          ;*idiv
>>                                       ; -
>> sun.nio.cs.ext.EUC_TW_C_d_b_c1_f3_shortMap4$Encoder::db at 3 (line 515)
>>  0x00ba4f70: mov    %edx,%ebp
>>  0x00ba4f72: sar    $0x6,%ebp
>>  0x00ba4f75: shr    $0x6,%edx
>>  0x00ba4f78: imul   $0x5e,%ebp,%ebp
>>  0x00ba4f7b: sub    %ebp,%ecx
>>  0x00ba4f7d: and    $0xff,%edx
>>  0x00ba4f83: and    $0xff,%ecx
>>  0x00ba4f89: shl    $0x8,%edx
>>  0x00ba4f8c: or     %ecx,%edx
>>  ...
>>
>> Complete output here (line 2330):
>>
>> https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/j7_EUC_TW/log/C_d_b_c1_f3_shortMap4_PA_2.xml?rev=888&view=markup
>>
>> Why doesn't HotSpot use div machine code?
>> I guess this would be faster here.
>>
>> -Ulf
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20091223/f226c0f4/attachment.html 


More information about the hotspot-compiler-dev mailing list