RFR: 8179444: AArch64: Put zero_words on a diet

White, Derek Derek.White at cavium.com
Mon May 8 15:56:00 UTC 2017


Hi Andrew,

Looks OK then.

If I find a major performance issue on misaligned stp I'll propose a fix separately. In some HW I've seen a up to a 2x performance penalty on unaligned destinations in some memcpy-like code.

 - Derek 

-----Original Message-----
From: Andrew Haley [mailto:aph at redhat.com] 
Sent: Sunday, May 07, 2017 4:41 AM
To: White, Derek <Derek.White at cavium.com>; Roland Westrelin <rwestrel at redhat.com>; hotspot-dev Source Developers <hotspot-dev at openjdk.java.net>
Subject: Re: RFR: 8179444: AArch64: Put zero_words on a diet

Hi,

On 05/05/17 23:54, White, Derek wrote:

> src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:
>  - zero_words(reg, reg):
>    - Comment mentions that it's used by the ClearArray pattern, but 
> it's also used by generate_fill().

OK, but there's no contradiction there.  The comment just explains why zero_words must be small.

>    - The old zero_words(), via block_zero() and fill_words(), would 
> align base to 16-bytes before doing stps, the new code doesn't. It may 
> be worth conditionally pre-aligning if AvoidUnalignedAcesses is true. 
> And could conditionally remove pre-aligning in zero_blocks.

But that'd bloat zero_words, surely.  And I haven't see the new code running any slower on any hardware,

> Line 4999: This is pre-existing, but it's confusing - could you rename 
> "ShortArraySize" to "SmallArraySize"?

OK.

> - zero_words(reg, int):
>  - I don't see a great way to handle unaligned accesses without 
> blowing up the code size. So no real request here.
>    - It looks like worst case is 15 unaligned stp instructions. 
>    - With some benchmarking someday I might argue for calling this 
> constant count version for smaller copies.

I don't understand this.

>    - Unless ClearArray only zero's from the first array element on - 
> then we could guess if base is 16-byte unaligned by looking at the 
> array header size.
> 
> -  zero_dcache_blocks():

>   - Suggest a new comment that mentions that it's only called from 
> zero_blocks()? Or if it's meant to be more general then add comments 
> about the requirements for base alignment, and that cnt has to be >= 
> 2*zva_length.

OK.

>   - zero_blocks DOES align base to 16-bytes, so we don't need to check 
> here? Or make it a runtime assert?

I did wonder about that, but left it in as a safety net because its cost is immeasurably small.

Thanks for such a detailed review.

Andrew.


More information about the hotspot-dev mailing list