[aarch64-port-dev ] RFR: JDK-8203041 : AArch64: fix overflow in immediate cmp/cmn instruction

Mon May 21 07:52:47 UTC 2018

Hi Andrew,

Thanks for your review.

In my patch I decided not to replace cmp with subs at callsite (in generate_large_array_equals()) because imm cmp has accidentally been misused for serval times such as 
https://bugs.openjdk.java.net/browse/JDK-8184900
https://bugs.openjdk.java.net/browse/JDK-8161190

I added an assert to ensure that for large imm all callsites do not use rscratch1. This behavior is only applied for assert failure scenraios.
In current implementation, when imm is not operand_valid_for_add_sub_immediate(), cmp imm always failed.
This patch was able to reduce assert failure possibilities for cmp imm misusage.

But your suggestion makes sense. Rscratch1 may imports more confusion. Let's replace cmp with subs at callsite when this assertion failure happens.
The only concern is that this depends on engineers be aware of immediate bits limitation for imm cmp instruction.

Later I will submit a new patch.

Best Regards,
Joshua

-----Original Message-----
From: Andrew Haley <aph at redhat.com> 
Sent: Friday, May 18, 2018 8:21 PM
To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>; Joshua Zhu <Joshua.Zhu at arm.com>; aarch64-port-dev at openjdk.java.net
Cc: nd <nd at arm.com>
Subject: Re: [aarch64-port-dev ] RFR: JDK-8203041 : AArch64: fix overflow in immediate cmp/cmn instruction

On 05/18/2018 12:15 PM, Dmitrij Pochepko wrote:
> 
> On 18.05.2018 11:41, Andrew Haley wrote:
>> On 05/18/2018 04:03 AM, Joshua Zhu wrote:
>>>
>>> JVM crashed with "-XX:SoftwarePrefetchHintDistance=32760" option.
>> It didn't crash: it was an assertion failure.
>>
>>> This causes overflow in immediate operand for jtreg TestOptionsWithRanges.
>>>
>>> See JBS for the backtrace.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8203041
>>>
>>> Webrev: http://cr.openjdk.java.net/~zyao/8203041/
>> Thank you.
>>
>> This patch is problematic in several ways.  The type of 
>> prefetchLoopThreshold is int, so it's 32 bits.  Casting it (via the 
>> call to cmp()) to unsigned int is probably wrong too.
>
> not quite that. prefetchLoopThreshold is calculated as 
> (SoftwarePrefetchHintDistance + 32), where 
> SoftwarePrefetchHintDistance is in [0, 32768] range by design here, 
> so, it's always within 32 bits range.  First cmp argument is cnt1, 
> which is amount of array elements left to check, which is also always 
> positive 32bit integer. Basically, this cmp can be changed to cmpw, 
> but it basically won't change anything for CPU.

It is always clearer to use the natural types.  We aren't consistent about this in our intrinsic code, I know.

>> rscratch1 isn't available for use in a compare instruction because we 
>> sometimes say cmp(rscratch1, imm).  I suspect it'd be better to use a 
>> subsw in generate_large_array_equals() if you can find a spare 
>> register for the destination.

> according to spec, cmp is an alias for subs when destination register 
> is ZR, so it won't change anything

Mmm, but you need to see the patch, which uses rscratch1 as a temp.

>> There are several other coding problems in 
>> generate_large_array_equals(), in particular the use of code like
>>
>>    Register tmp1 = rscratch1
>>
>> This creates a hidden alias for rscratch1, but rscratch1 is used by 
>> assembler macros.  This is very dangerous.  I'll have a look some 
>> more.
>
> This was done for unification purposes, to have all temporary 
> registers named as "tmp*". "rscratch1" name is not used in this 
> intrinsic code to avoid confusion.

That is very, er, confusing.  :-)

> Also, assert for different registers is present to prevent few 
> possible kinds of problems in future. Surely, this renaming can be 
> removed if you think it'll make code look better.

It will make code safer.  If the name meant anything it might be understandable, but tmp1 and rscratch1 are meaningless names so the alias gains nothing.  And the real danger is that rscratch1 is used by macros, so it's not really safe to use it as a temp without great care.

> Overall, I think it should be fixed by moving calculated immediate 
> into register and then perform cmp/subs between 2 registers. For 
> example, we can use tmp3. Then instead of:
> 
> __ cmp(cnt1, prefetchLoopThreshold);      (in 2 places)
> 
> code will look like:
> 
> __ mov(tmp3, prefetchLoopThreshold); // separate move to handle 
> situation when prefetchLoopThreshold is not in 12 bit range __ 
> cmp(cnt1, tmp3);

I think it makes more sense to use subs, which already handles this case.  We don't really need any more conditional code based on whether an immediate fits in an add instruction.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671