RFR: 8179444: AArch64: Put zero_words on a diet
White, Derek
Derek.White at cavium.com
Mon May 8 15:56:00 UTC 2017
Hi Andrew,
Looks OK then.
If I find a major performance issue on misaligned stp I'll propose a fix separately. In some HW I've seen a up to a 2x performance penalty on unaligned destinations in some memcpy-like code.
- Derek
-----Original Message-----
From: Andrew Haley [mailto:aph at redhat.com]
Sent: Sunday, May 07, 2017 4:41 AM
To: White, Derek <Derek.White at cavium.com>; Roland Westrelin <rwestrel at redhat.com>; hotspot-dev Source Developers <hotspot-dev at openjdk.java.net>
Subject: Re: RFR: 8179444: AArch64: Put zero_words on a diet
Hi,
On 05/05/17 23:54, White, Derek wrote:
> src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:
> - zero_words(reg, reg):
> - Comment mentions that it's used by the ClearArray pattern, but
> it's also used by generate_fill().
OK, but there's no contradiction there. The comment just explains why zero_words must be small.
> - The old zero_words(), via block_zero() and fill_words(), would
> align base to 16-bytes before doing stps, the new code doesn't. It may
> be worth conditionally pre-aligning if AvoidUnalignedAcesses is true.
> And could conditionally remove pre-aligning in zero_blocks.
But that'd bloat zero_words, surely. And I haven't see the new code running any slower on any hardware,
> Line 4999: This is pre-existing, but it's confusing - could you rename
> "ShortArraySize" to "SmallArraySize"?
OK.
> - zero_words(reg, int):
> - I don't see a great way to handle unaligned accesses without
> blowing up the code size. So no real request here.
> - It looks like worst case is 15 unaligned stp instructions.
> - With some benchmarking someday I might argue for calling this
> constant count version for smaller copies.
I don't understand this.
> - Unless ClearArray only zero's from the first array element on -
> then we could guess if base is 16-byte unaligned by looking at the
> array header size.
>
> - zero_dcache_blocks():
> - Suggest a new comment that mentions that it's only called from
> zero_blocks()? Or if it's meant to be more general then add comments
> about the requirements for base alignment, and that cnt has to be >=
> 2*zva_length.
OK.
> - zero_blocks DOES align base to 16-bytes, so we don't need to check
> here? Or make it a runtime assert?
I did wonder about that, but left it in as a safety net because its cost is immeasurably small.
Thanks for such a detailed review.
Andrew.
More information about the hotspot-dev
mailing list