RFR(M): 8231561: [lworld] C2 generates inefficient code for acmp

Tobias Hartmann tobias.hartmann at oracle.com
Tue Oct 8 13:55:10 UTC 2019


Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8231561
http://cr.openjdk.java.net/~thartmann/8231561/webrev.00/

This patch includes fixes for the following performance issues that Sergey found:
(1) Clearing of array property bits includes useless mov instructions.
(2) When loading the klass of the acmp operands, we already know that one operand is an inline type
and therefore don't need to clear the storage properties of the klass pointer. In fact, there are
many other places in the code where the LoadKlassNode implementation does not need to clear the
storage property bits because we know that either the object can't be an array or we don't load the
klass ptr from the object header but from some other location.
(3) Implicit null checking does not work for the 'and' instruction that is used for the
is_always_locked check because the corresponding MachNode references a constant load of the mask
that prevents hoisting. As a result, in the acmp implementation, a later load of the klass is
converted to an implicit null check and hoisted to before that check (i.e. it's always executed
although it's only needed if the first operand is an inline type).
(4) When loading the acmp operands from a field, complex memory loads should be used for decoding
when loading the mark word for the is_always_locked check.

In addition, I've noticed that the inline type guard for the System.identityHashCode intrinsic is
useless because inline types always have the always_locked_pattern set and therefore a subsequent
guard will always trigger.

Below is the current, relevant code for acmp and how it changes with above fixes.

  0x00007f0424aeaec8:   mov    0x8(%rcx),%r11d              ; implicit exception
  0x00007f0424aeaecc:   mov    $0x405,%r10d
  0x00007f0424aeaed2:   and    (%rcx),%r10
  0x00007f0424aeaed5:   cmp    $0x405,%r10                  ; is_always_locked check
  0x00007f0424aeaedc:   jne    0x00007f0424aeaf09

  0x00007f0424aeaede:   mov    0x8(%rdx),%r10d              ; implicit exception
  0x00007f0424aeaee2:   mov    %r10d,%r8d
  0x00007f0424aeaee5:   mov    %r11d,%r10d
  0x00007f0424aeaee8:   and    $0x1fffffff,%r8d             ; property bit clearing
  0x00007f0424aeaeef:   and    $0x1fffffff,%r10d            ; property bit clearing
  0x00007f0424aeaef6:   cmp    %r8d,%r10d                   ; klass check
  0x00007f0424aeaef9:   jne    0x00007f0424aeaf09

With new match rule for clearing array property bits (1):
  0x00007f92f44b1ec8:   mov    0x8(%rcx),%r11d              ; implicit exception
  0x00007f92f44b1ecc:   and    $0x1fffffff,%r11d            ; property bit clearing
  0x00007f92f44b1ed3:   mov    $0x405,%r10d
  0x00007f92f44b1ed9:   and    (%rcx),%r10
  0x00007f92f44b1edc:   cmp    $0x405,%r10                  ; is_always_locked check
  0x00007f92f44b1ee3:   jne    0x00007f92f44b1f05

  0x00007f92f44b1ee5:   mov    0x8(%rdx),%r8d               ; implicit exception
  0x00007f92f44b1ee9:   and    $0x1fffffff,%r8d             ; property bit clearing
  0x00007f92f44b1ef0:   cmp    %r8d,%r11d                   ; klass check
  0x00007f92f44b1ef3:   jne    0x00007f92f44b1f05

Without unnecessary clearing of array property bits (2):
  0x00007fe810aeabc8:   mov    0x8(%rcx),%r11d              ; implicit exception
  0x00007fe810aeabcc:   mov    $0x405,%r10d
  0x00007fe810aeabd2:   and    (%rcx),%r10
  0x00007fe810aeabd5:   cmp    $0x405,%r10                  ; is_always_locked check
  0x00007fe810aeabdc:   jne    0x00007fe810aeabf5

  0x00007fe810aeabde:   mov    0x8(%rdx),%r10d              ; implicit exception
  0x00007fe810aeabe2:   cmp    %r10d,%r11d                  ; klass check
  0x00007fe810aeabe5:   jne    0x00007fe810aeabf5

With implicit null check fix (3):
  0x00007f51bc08bbc8:   mov    $0x405,%r10d
  0x00007f51bc08bbce:   and    (%rcx),%r10                  ; implicit exception
  0x00007f51bc08bbd1:   cmp    $0x405,%r10                  ; is_always_locked check
  0x00007f51bc08bbd8:   jne    0x00007f51bc08bbf5

  0x00007f51bc08bbda:   mov    0x8(%rdx),%r11d              ; implicit exception
  0x00007f51bc08bbde:   mov    0x8(%rcx),%r10d
  0x00007f51bc08bbe2:   cmp    %r11d,%r10d                  ; klass check
  0x00007f51bc08bbe5:   jne    0x00007f51bc08bbf5


The same code when loading the operands from a field (narrow oop):
  0x00007f1c8d177548:   mov    0x8(%r12,%rbp,8),%r10d       ; implicit exception
  0x00007f1c8d17754d:   lea    (%r12,%rbp,8),%rsi
  0x00007f1c8d177551:   mov    $0x405,%r11d
  0x00007f1c8d177557:   and    (%rsi),%r11
  0x00007f1c8d17755a:   cmp    $0x405,%r11                  ; is_always_locked check
  0x00007f1c8d177561:   jne    0x00007f1c8d177581

  0x00007f1c8d177563:   mov    0x8(%r12,%r8,8),%r11d        ; implicit exception
  0x00007f1c8d177568:   cmp    %r11d,%r10d                  ; klass check
  0x00007f1c8d17756b:   jne    0x00007f1c8d177581

With complex memory load in the is_always_locked check (4):
  0x00007f2b9d1774c8:   mov    $0x405,%r10d
  0x00007f2b9d1774ce:   and    (%r12,%rbp,8),%r10           ; implicit exception
  0x00007f2b9d1774d2:   cmp    $0x405,%r10                  ; is_always_locked check
  0x00007f2b9d1774d9:   jne    0x00007f2b9d177501

  0x00007f2b9d1774db:   mov    0x8(%r12,%r11,8),%r10d       ; implicit exception
  0x00007f2b9d1774e0:   mov    0x8(%r12,%rbp,8),%r8d
  0x00007f2b9d1774e5:   cmp    %r10d,%r8d                   ; klass check
  0x00007f2b9d1774e8:   jne    0x00007f2b9d177501


Thanks,
Tobias



More information about the valhalla-dev mailing list