[aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable

Patrick Zhang OS patrick at os.amperecomputing.com
Tue Nov 5 01:39:29 UTC 2019


Reformat the description below. Please help review, thanks.

Regards
Patrick

-----Original Message-----
From: aarch64-port-dev <aarch64-port-dev-bounces at openjdk.java.net> On Behalf Of Patrick Zhang OS
Sent: Tuesday, October 29, 2019 5:59 PM
To: aarch64-port-dev at openjdk.java.net
Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable

Hi,

Could you please review this patch, thanks.

JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 
Webrev: https://cr.openjdk.java.net/~qpzhang/8229351/webrev.02
(this starts from .02 since there had been some internal review and updates)

Changes:
1. Split the STUB_THRESHOLD from the hard-coded 72 to be CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more flexible control over the stub thresholds for string_compare intrinsics, especially for various uArchs.
2. MacroAssembler::string_compare LL and UU shared the same threshold, actually UU may only require the half (length of chars) of that of LL's, because one character has two-bytes for UU, while for compacted LL strings, one character means one byte. In addition, LU/UL may need a separated threshold, as the stub function is different from the same encoding one, and the performance may vary as well.
3. In generate_compare_long_string_same_encoding, the hard-coded 72 was originally able to ensure that there can be always 64 bytes at least for the prefetch code path. However once a smaller stub threshold is set, a new condition is needed to tell if this would be still valid, or has to go to the NO_PREFETCH branch. This change can ensure the correctness.
4. In generate_compare_long_string_different_encoding, some temp vars for handling the last 4 characters are not valid any longer, cleaned up strU and strL, and related pointers initialization to the next U (cnt1) and L (tmp2).
5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not needed, as tmpU or tmpL point to the same register.

Tests:
1. For function check, I have run
jdk jtreg tier1 tests, with default vm flags
hotspot jtreg tests: runtime/compiler/gc parts, with "-Xcomp -XX:-TieredCompilation"
jck10/api/java.lang 1609 cases and other selected modules, no new failures found, with default vm flags and "-Xcomp -XX:-TieredCompilation" respectively;
some specific test cases had been carefully executed to double check, i.e., TestStringCompareToDifferentLength.java [1] and TestStringCompareToDifferentLength.java [1] introduced by [2], StrCmpTest.java [3] introduced by [4].
2. For performance check, I have run
string-density-bench/CompareToBench.java [5] and StringCompareBench.java [6] respectively,
and SPECjbb2015.jar, no obvious performance change has been found (since the default threshold is NOT changed within this patch).
FYI. with Ampere eMAG system, microbenchmarks [5][6] can have 1.5x consistent perf gain with LU/UL comparison for shorter strings (<72 chars, smaller stub thresholds), and slight improvement (5~10%) with LL/UU cases.

Refs:
[1] https://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string
[2] https://bugs.openjdk.java.net/browse/JDK-8218966 AArch64: String.compareTo() can read memory after string 
[3] https://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java 
[4] https://bugs.openjdk.java.net/browse/JDK-8202326 AARCH64: optimize string compare intrinsic
[5] https://cr.openjdk.java.net/~shade/density/string-density-bench.jar
[6] https://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java

Regards
Patrick



More information about the aarch64-port-dev mailing list