RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2]

Wed Sep 14 16:35:53 UTC 2022

On Wed, 14 Sep 2022 02:48:57 GMT, Quan Anh Mai <duke at openjdk.org> wrote:

>> I don't think we should replace the inc/dec by add.
>> 
>> On my desktop, I see the following:
>> Before:
>> Benchmark               Mode  Cnt    Score   Error  Units
>> BasicRules.add_mem_con  avgt    3  132.268 ± 0.599  ns/op
>> BasicRules.inc_mem      avgt    3  169.980 ± 0.617  ns/op
>> 
>> After:
>> Benchmark               Mode  Cnt    Score   Error  Units
>> BasicRules.add_mem_con  avgt    3  117.426 ± 0.128  ns/op
>> BasicRules.inc_mem      avgt    3  182.907 ± 0.277  ns/op
>> 
>> The inc_mem jmh performance is worse after the patch. 
>> 
>> There is already UseIncDec option which is set appropriately to select whether to generate inc/dec or the add/sub instruction.
>
> @sviswa7 Thanks a lot for your review, I have reverted that change. I don't understand why, though, it does not seem that the bottleneck is in the predecoder.

@merykitty Thanks for reverting those changes. Could you please also add jmh tests for the following:
 1) AndL with 255
 2) AndL with 65535
 3) DivL by 10
For 1) and 2) we are changing the instruction from q version to l version, so want to make sure the performance is at par atleast.
For 3) it will be good to check that the compiler is optimizing divide by 10 for long data type as well now.

-------------

PR: https://git.openjdk.org/jdk/pull/9791