[aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA'
Edward Nevill
edward.nevill at gmail.com
Mon Apr 18 08:10:51 UTC 2016
On Fri, 2016-04-15 at 20:45 +0800, Long Chen wrote:
> Hi
>
> Please review this patch making use of DC ZVA to do block zeroing.
>
> http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.patch
>
> I’m sorry that I can’t produce a test case matching the ‘clear_array’ pattern showing obvious improvement. However, generating ‘DC ZVA’ should be the right thing to do as it usually has better cache behaviors. Besides, gcc and linux’s memset have been using ‘DC ZVA’.
>
Hi Long,
Thanks for this. I have benchmarked this on 3 different partners HW using the following JMH test case
http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java
On two partners HW I see a significant improvement. On one partners HW I see almost identical performance.
Here are the results I get with the original normalised to 100 sec to avoid disclosing any absolute performance figures.
Partner A, Original = 100 sec, revised = 100.7 sec
Partner B, Original = 100 sec, revised = 97.6 sec
Partner C, Original = 100 sec, revised = 91.2 sec
One small improvement might be to above using a tmp register which has to be allocated here
-instruct clearArray_imm_reg(immL cnt, iRegP base, Universe dummy, rFlagsReg cr)
+instruct clearArray_imm_reg(immL cnt, iRegP base, iRegLNoSp tmp, Universe dummy, rFlagsReg cr)
- __ zero_words($base$$Register, (u_int64_t)$cnt$$constant);
+ __ zero_words($base$$Register, (u_int64_t)$cnt$$constant, $tmp$$Register);
by using 'lr' as the tmp register here
+ } else if (UseBlockZeroing && cnt >= (u_int64_t)(BlockZeroingLowLimit >> LogBytesPerWord)) {
+ mov(tmp, cnt);
+ zero_words(base, tmp, true);
AFAIK, 'lr' is always available as a tmp register in C2 generated code.
All the best,
Ed.
More information about the aarch64-port-dev
mailing list