Why doesn't HotSpot use div machine code?

Ulf Zibis Ulf.Zibis at gmx.de
Wed Dec 23 16:45:59 PST 2009


Osvaldo, much thanks for the interesting document.

... not to forget, that in my case I only need 8-bit unsigned results 
from uword/ubyte.
In this case, the price for the div instruction is maybe same as for the 
2 times multiply + add + shift replacement.

-Ulf


Am 23.12.2009 23:51, Osvaldo Doederlein schrieb:
> Hi,
>
> Perhaps because of this:
> http://support.amd.com/us/Processor_TechDocs/40546-PUB-Optguide_3-11_5-21-09.pdf
>
> imul's latencies are tiny (3 cycles for both forms used in the code), 
> but div/idiv's are enormous (check Table 7). These numbers are for a 
> specific CPU family but I don't expect this to be very different in 
> other CPUs. The code produce by HotSpot will probably win, even with 
> the extra shifts, movs etc.
>
> OTOH, I wonder if HotSpot would be capable to produce your desired 
> code if it was faster - it consumes less registers, and that's also 
> very important remarkably in x86.
>
> A+
> Osvaldo
>
> 2009/12/23 Ulf Zibis <Ulf.Zibis at gmx.de <mailto:Ulf.Zibis at gmx.de>>
>
>     In my code I have a method similar to the following:
>     (divide char value by 8-bit constant and combine it's lower 8-bit
>     quotient and remainder to a new char value)
>
>        static final byte BYTE_RANGE = 0x5e;
>        static char db(char db) {
>           return (char)((((db / (BYTE_RANGE&0xff) & 0xff) << 8) | (db
>     % (BYTE_RANGE&0xff) & 0xff)) // force DIV word/byte
>                   + ...;
>       }
>
>     This could be compiled to:
>
>     mov    %cx,%ax    ; copy char db to ax register
>     div    $0x5e
>     xchg   %al,%ah
>
>     ... but disassembly output results:
>     (some sophisticated trick using 2 imul instructions)
>
>      0x00ba4f67: mov    $0xae4c415d,%eax
>      0x00ba4f6c: imul   %ecx
>      0x00ba4f6e: add    %ecx,%edx          ;*idiv
>                                           ; -
>     sun.nio.cs.ext.EUC_TW_C_d_b_c1_f3_shortMap4$Encoder::db at 3 (line 515)
>      0x00ba4f70: mov    %edx,%ebp
>      0x00ba4f72: sar    $0x6,%ebp
>      0x00ba4f75: shr    $0x6,%edx
>      0x00ba4f78: imul   $0x5e,%ebp,%ebp
>      0x00ba4f7b: sub    %ebp,%ecx
>      0x00ba4f7d: and    $0xff,%edx
>      0x00ba4f83: and    $0xff,%ecx
>      0x00ba4f89: shl    $0x8,%edx
>      0x00ba4f8c: or     %ecx,%edx
>      ...
>
>     Complete output here (line 2330):
>     https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/j7_EUC_TW/log/C_d_b_c1_f3_shortMap4_PA_2.xml?rev=888&view=markup
>     <https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/j7_EUC_TW/log/C_d_b_c1_f3_shortMap4_PA_2.xml?rev=888&view=markup>
>
>     Why doesn't HotSpot use div machine code?
>     I guess this would be faster here.
>
>     -Ulf
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20091224/c4ecac9e/attachment-0001.html 


More information about the hotspot-compiler-dev mailing list