RFR: JDK-8213473: Prevent transformation of LoadB->AndI->CmpI pattern to facilitate testb instruction matching

Roman Kennke rkennke at redhat.com
Thu Nov 8 18:46:52 UTC 2018


Hi Vladimir,

> I was actually thinking about testing on platforms which do not execute
> such instructions folding.
> Like Aarch64 or SPARC, to see if your changes (not converting to LoadUB)
> affects them.

Right.
I cannot test Sparc.
I could test AArch64 but would first need to come up with a
micro-benchmark. I doubt this would show up in general benchmarks like
specjvm.

It would be good to know what good does the orginal transformation from
LoadB/AndI -> LoadUB/AndI do. Is there any performance benefit there
that we might loose when not doing the transformation?

We could, of course, be conservative and guard the code with #ifdef to
only make it apply on x86. Would be ugly though.

Roman

> Vladimir
> 
> On 11/8/18 4:52 AM, Roman Kennke wrote:
>> Hi Vladimir,
>>
>> I haven't (yet) come up with an actual micro-benchmark, and haven't yet
>> run specjvm or such, but I've written a program that triggers the
>> matching and shows the assembly.
>>
>> This:
>> https://paste.fedoraproject.org/paste/n7vad~kq2-qB65x6PdCuGw
>>
>> Executed like this (usual jmh):
>> java -jar target/benchmarks.jar -prof perfasm
>>
>> Yields the following assembly (without patch):
>>             0x00007f5ff0587edc: movzbl 0xc(%rsi),%r10d
>> // unrelated
>>    0,06%    0x00007f5ff0587ee1: mov    $0x2,%r8d
>> // unrelated
>>             0x00007f5ff0587ee7: mov    $0x1,%eax
>>             0x00007f5ff0587eec: test   $0x2a,%r10d
>>
>> and with patch:
>>
>>    0,08%     0x00007f27605888e7: testb  $0x2a,0xc(%rsi)
>>
>> I don't know if there's any chance that this performs worse? Maybe the
>> two instructions schedule better because CPU can do other work while
>> waiting for memory? It surely saves a register, which might come useful
>> under register pressure. And it compiles to more compact code.
>>
>> Do you want me to take it further and come up with an actual working
>> microbenchmark? Or do you have benchmarks that you want to run?
>>
>> Thanks,
>> Roman
>>
>>> Roman, your change seems fine.
>>>
>>> I thought for SPARC it may regress but the only complex rule with
>>> AndI+LoadUB I found is for ConvI2L:
>>>
>>> match(Set dst (ConvI2L (AndI (LoadUB mem) mask)));
>>>
>>> I did not find anything on other platforms.
>>>
>>> Anyway it would be nice to run some performance testing.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 11/7/18 3:46 AM, Roman Kennke wrote:
>>>> In JDK-8203628, Aleksey added matchers to generate testb instruction
>>>> (on
>>>> x86) for LoadB->AndI->CmpI sequence. However, in AndINode::Ideal() we
>>>> transform any LoadB->AndI to LoadUB->AndI which fails to match this
>>>> pattern. Which means we never actually match for testb. The original
>>>> matchers for LoadUB->AndI->CmpI needed to be removed because of
>>>> JDK-8204479.
>>>>
>>>> The proposed enhancement would prevent the transformation if we have a
>>>> straight LoadB->AndI->CmpI sequence. While the matcher is x86-only, I
>>>> don't think it actually hurts to prevent this particular sequence. It
>>>> should generate better code (e.g. single testb instructions) for things
>>>> like:
>>>>
>>>> byte field;
>>>> ..
>>>>
>>>> if ((field & 3) == 0) ..
>>>>
>>>> or even:
>>>>
>>>> if (field == x) .. because we always mask byte loads
>>>>
>>>> or even (not sure) same for boolean fields.
>>>>
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8213473
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8213473/webrev.00/
>>>>
>>>> Testing: passes hotspot/jtreg:tier1 here
>>>>
>>>> Thoughts? Reviews?
>>>>
>>>> Roman
>>>>
>>>>
>>>>
>>>>
>>>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20181108/e7f5890a/signature.asc>


More information about the hotspot-compiler-dev mailing list