RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) [v2]

Andrew Haley aph at openjdk.org
Sun Jun 30 16:46:20 UTC 2024


On Tue, 28 May 2024 20:11:35 GMT, Tobias Hotz <duke at openjdk.org> wrote:

>> This PR adds a new ideal optimization for the following pattern:
>> 
>> public boolean testFunc(int a) {
>>     int mask = 0b101;
>>     int shift = 12;
>>     return ((a >> shift) & mask) == 0;
>> }
>> 
>> Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731
>> If the shift is already an unsiged bit shift, it works as well.
>> On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction.
>
> Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - LF endings...
>  - Add a benchmark to measure effect of new ideal transformation

There is a minimal-to-nothing speed difference on a fast AArch64 system (Apple M3) because the benchmark is dominated by the latency to load the data from memory, which is far longer than the time to do the work. However, the code is smaller , so this is a win.


   add          x17, x15, w7, sxtw #2;*ia   add          x3, x7, w13, sxtw #2;
                                       ;                                      
                                       ;                                      
   ldr          w10, [x17, #0x10]           ldr          w15, [x3, #0x10]
   ldr          w12, [x17, #0x2c]        
                                            ldp          w1, w21, [x3, #0x24]
   ldp          w1, w11, [x17, #0x14]       ldr          w2, [x3, #0x20]
                                            ldr          w16, [x3, #0x2c]
   ldp          w14, w13, [x17, #0x1c]      tst          w15, #0xff00
                                            ldr          w14, [x3, #0x14]
   ldp          w16, w17, [x17, #0x24]      cset         w15, eq             ;
   ubfx         w2, w17, #8, #8                                               
   ubfx         w17, w1, #8, #8                                               
   ubfx         w3, w10, #8, #8          
   ubfx         w1, w11, #8, #8             ldp          w0, w17, [x3, #0x18]
   ubfx         w10, w12, #8, #8            tst          w21, #0xff00
   ubfx         w11, w14, #8, #8            cset         w3, eq
   ubfx         w12, w16, #8, #8            tst          w1, #0xff00
   ubfx         w14, w13, #8, #8            cset         w15, eq
   cmp          w3, #0                      tst          w2, #0xff00
   cset         w13, eq                     cset         w2, eq
   cmp          w12, #0                     tst          w17, #0xff00
   cset         w12, eq                     cset         w17, eq
   cmp          w14, #0                     tst          w0, #0xff00
   cset         w14, eq                     cset         w1, eq
   cmp          w11, #0                     tst          w14, #0xff00
   cset         w11, eq                     cset         w14, eq
                                            tst          w16, #0xff00        ;
                                                                              
   cmp          w1, #0                                                        
   cset         w16, eq                                                       
   cmp          w17, #0                     add          w13, w13, #8        ;
   cset         w13, eq                                                       
   cmp          w10, #0                                                       
                                            cset         w0, eq              ;
   cset         w17, eq                                                       
                                            cmp          w13, w12
                                            b.lt         #0x1100dd420        ;
   cmp          w2, #0                   
   add          w7, w7, #8               
   cset         w10, eq                  
   cmp          w7, w22                  
   b.lt         #0x10be2ea80

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19310#issuecomment-2198617427


More information about the hotspot-compiler-dev mailing list