RFR(M): 8231561: [lworld] C2 generates inefficient code for acmp
Tobias Hartmann
tobias.hartmann at oracle.com
Tue Oct 8 13:55:10 UTC 2019
Hi,
please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8231561
http://cr.openjdk.java.net/~thartmann/8231561/webrev.00/
This patch includes fixes for the following performance issues that Sergey found:
(1) Clearing of array property bits includes useless mov instructions.
(2) When loading the klass of the acmp operands, we already know that one operand is an inline type
and therefore don't need to clear the storage properties of the klass pointer. In fact, there are
many other places in the code where the LoadKlassNode implementation does not need to clear the
storage property bits because we know that either the object can't be an array or we don't load the
klass ptr from the object header but from some other location.
(3) Implicit null checking does not work for the 'and' instruction that is used for the
is_always_locked check because the corresponding MachNode references a constant load of the mask
that prevents hoisting. As a result, in the acmp implementation, a later load of the klass is
converted to an implicit null check and hoisted to before that check (i.e. it's always executed
although it's only needed if the first operand is an inline type).
(4) When loading the acmp operands from a field, complex memory loads should be used for decoding
when loading the mark word for the is_always_locked check.
In addition, I've noticed that the inline type guard for the System.identityHashCode intrinsic is
useless because inline types always have the always_locked_pattern set and therefore a subsequent
guard will always trigger.
Below is the current, relevant code for acmp and how it changes with above fixes.
0x00007f0424aeaec8: mov 0x8(%rcx),%r11d ; implicit exception
0x00007f0424aeaecc: mov $0x405,%r10d
0x00007f0424aeaed2: and (%rcx),%r10
0x00007f0424aeaed5: cmp $0x405,%r10 ; is_always_locked check
0x00007f0424aeaedc: jne 0x00007f0424aeaf09
0x00007f0424aeaede: mov 0x8(%rdx),%r10d ; implicit exception
0x00007f0424aeaee2: mov %r10d,%r8d
0x00007f0424aeaee5: mov %r11d,%r10d
0x00007f0424aeaee8: and $0x1fffffff,%r8d ; property bit clearing
0x00007f0424aeaeef: and $0x1fffffff,%r10d ; property bit clearing
0x00007f0424aeaef6: cmp %r8d,%r10d ; klass check
0x00007f0424aeaef9: jne 0x00007f0424aeaf09
With new match rule for clearing array property bits (1):
0x00007f92f44b1ec8: mov 0x8(%rcx),%r11d ; implicit exception
0x00007f92f44b1ecc: and $0x1fffffff,%r11d ; property bit clearing
0x00007f92f44b1ed3: mov $0x405,%r10d
0x00007f92f44b1ed9: and (%rcx),%r10
0x00007f92f44b1edc: cmp $0x405,%r10 ; is_always_locked check
0x00007f92f44b1ee3: jne 0x00007f92f44b1f05
0x00007f92f44b1ee5: mov 0x8(%rdx),%r8d ; implicit exception
0x00007f92f44b1ee9: and $0x1fffffff,%r8d ; property bit clearing
0x00007f92f44b1ef0: cmp %r8d,%r11d ; klass check
0x00007f92f44b1ef3: jne 0x00007f92f44b1f05
Without unnecessary clearing of array property bits (2):
0x00007fe810aeabc8: mov 0x8(%rcx),%r11d ; implicit exception
0x00007fe810aeabcc: mov $0x405,%r10d
0x00007fe810aeabd2: and (%rcx),%r10
0x00007fe810aeabd5: cmp $0x405,%r10 ; is_always_locked check
0x00007fe810aeabdc: jne 0x00007fe810aeabf5
0x00007fe810aeabde: mov 0x8(%rdx),%r10d ; implicit exception
0x00007fe810aeabe2: cmp %r10d,%r11d ; klass check
0x00007fe810aeabe5: jne 0x00007fe810aeabf5
With implicit null check fix (3):
0x00007f51bc08bbc8: mov $0x405,%r10d
0x00007f51bc08bbce: and (%rcx),%r10 ; implicit exception
0x00007f51bc08bbd1: cmp $0x405,%r10 ; is_always_locked check
0x00007f51bc08bbd8: jne 0x00007f51bc08bbf5
0x00007f51bc08bbda: mov 0x8(%rdx),%r11d ; implicit exception
0x00007f51bc08bbde: mov 0x8(%rcx),%r10d
0x00007f51bc08bbe2: cmp %r11d,%r10d ; klass check
0x00007f51bc08bbe5: jne 0x00007f51bc08bbf5
The same code when loading the operands from a field (narrow oop):
0x00007f1c8d177548: mov 0x8(%r12,%rbp,8),%r10d ; implicit exception
0x00007f1c8d17754d: lea (%r12,%rbp,8),%rsi
0x00007f1c8d177551: mov $0x405,%r11d
0x00007f1c8d177557: and (%rsi),%r11
0x00007f1c8d17755a: cmp $0x405,%r11 ; is_always_locked check
0x00007f1c8d177561: jne 0x00007f1c8d177581
0x00007f1c8d177563: mov 0x8(%r12,%r8,8),%r11d ; implicit exception
0x00007f1c8d177568: cmp %r11d,%r10d ; klass check
0x00007f1c8d17756b: jne 0x00007f1c8d177581
With complex memory load in the is_always_locked check (4):
0x00007f2b9d1774c8: mov $0x405,%r10d
0x00007f2b9d1774ce: and (%r12,%rbp,8),%r10 ; implicit exception
0x00007f2b9d1774d2: cmp $0x405,%r10 ; is_always_locked check
0x00007f2b9d1774d9: jne 0x00007f2b9d177501
0x00007f2b9d1774db: mov 0x8(%r12,%r11,8),%r10d ; implicit exception
0x00007f2b9d1774e0: mov 0x8(%r12,%rbp,8),%r8d
0x00007f2b9d1774e5: cmp %r10d,%r8d ; klass check
0x00007f2b9d1774e8: jne 0x00007f2b9d177501
Thanks,
Tobias
More information about the valhalla-dev
mailing list