From ci_notify at linaro.org Fri Nov 1 04:18:42 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Fri, 1 Nov 2019 04:18:42 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 11u on AArch64 Message-ID: <486311059.11170.1572581922901.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/summary/2019/304/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jun/15 pass: 5,737; fail: 5; not run: 11,623 Build 1: aarch64/2019/jun/27 pass: 5,737; fail: 5 Build 2: aarch64/2019/jul/02 pass: 5,737; fail: 5 Build 3: aarch64/2019/aug/03 pass: 5,746; fail: 4 Build 4: aarch64/2019/aug/10 pass: 5,747; fail: 4 Build 5: aarch64/2019/aug/15 pass: 5,753; fail: 4 Build 6: aarch64/2019/aug/22 pass: 5,755; fail: 4 Build 7: aarch64/2019/sep/04 pass: 5,764; fail: 2 Build 8: aarch64/2019/sep/05 pass: 5,764; fail: 2 Build 9: aarch64/2019/sep/10 pass: 5,764; fail: 2 Build 10: aarch64/2019/sep/17 pass: 5,763; fail: 3 Build 11: aarch64/2019/sep/21 pass: 5,764; fail: 2 Build 12: aarch64/2019/oct/04 pass: 5,764; fail: 2 Build 13: aarch64/2019/oct/17 pass: 5,764; fail: 2 Build 14: aarch64/2019/oct/31 pass: 5,784; fail: 1 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jun/15 pass: 8,409; fail: 506; error: 20 Build 1: aarch64/2019/jun/27 pass: 8,401; fail: 512; error: 22 Build 2: aarch64/2019/jul/02 pass: 8,407; fail: 498; error: 31 Build 3: aarch64/2019/aug/03 pass: 8,429; fail: 509; error: 18 Build 4: aarch64/2019/aug/10 pass: 8,450; fail: 485; error: 16 Build 5: aarch64/2019/aug/15 pass: 8,443; fail: 496; error: 13 Build 6: aarch64/2019/aug/22 pass: 8,446; fail: 494; error: 15 Build 7: aarch64/2019/sep/04 pass: 8,483; fail: 465; error: 10 Build 8: aarch64/2019/sep/05 pass: 8,465; fail: 479; error: 14 Build 9: aarch64/2019/sep/10 pass: 8,444; fail: 500; error: 14 Build 10: aarch64/2019/sep/17 pass: 8,462; fail: 482; error: 12 Build 11: aarch64/2019/sep/21 pass: 8,467; fail: 478; error: 13 Build 12: aarch64/2019/oct/04 pass: 8,444; fail: 498; error: 16 Build 13: aarch64/2019/oct/17 pass: 8,452; fail: 493; error: 16 Build 14: aarch64/2019/oct/31 pass: 8,468; fail: 490; error: 14 5 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jun/15 pass: 3,908 Build 1: aarch64/2019/jun/27 pass: 3,908 Build 2: aarch64/2019/jul/02 pass: 3,908 Build 3: aarch64/2019/aug/03 pass: 3,908 Build 4: aarch64/2019/aug/10 pass: 3,909 Build 5: aarch64/2019/aug/15 pass: 3,909 Build 6: aarch64/2019/aug/22 pass: 3,909 Build 7: aarch64/2019/sep/04 pass: 3,910 Build 8: aarch64/2019/sep/05 pass: 3,910 Build 9: aarch64/2019/sep/10 pass: 3,910 Build 10: aarch64/2019/sep/17 pass: 3,910 Build 11: aarch64/2019/sep/21 pass: 3,910 Build 12: aarch64/2019/oct/04 pass: 3,910 Build 13: aarch64/2019/oct/17 pass: 3,910 Build 14: aarch64/2019/oct/31 pass: 3,910 Previous results can be found here: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.46x Relative performance: Server critical-jOPS (nc): 7.91x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 210.67 Server 210.67 / Server 2014-04-01 (71.00): 2.97x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-06-16 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/166/results/ 2019-06-28 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/178/results/ 2019-07-03 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/183/results/ 2019-08-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/215/results/ 2019-08-11 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/222/results/ 2019-08-16 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/227/results/ 2019-08-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/234/results/ 2019-09-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/247/results/ 2019-09-07 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/248/results/ 2019-09-11 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/253/results/ 2019-09-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/260/results/ 2019-09-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/264/results/ 2019-10-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/277/results/ 2019-10-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/290/results/ 2019-11-01 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/304/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/ From aph at redhat.com Fri Nov 1 10:15:37 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 1 Nov 2019 10:15:37 +0000 Subject: [aarch64-port-dev ] RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet In-Reply-To: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> References: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> Message-ID: <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> On 10/31/19 6:48 PM, Zhengyu Gu wrote: > Right now, the decisions on, if a load barrier needs load reference > barrier, if so, what kind? and if the reference needs to be kept alive, > are scattered inside interpreter/c1/2 load barrier code, which is hard > to make them consistent. > > I would like to centralize the decision making into > ShenandoahBarrierSet, so them can be consistent and easy to maintain. You should say, at the start of every routine you touch, which registers are inputs, which are outputs, and (important) which may alias with rscratch1 and rscratch2. Please also mark clobbers of rscratch1 and 2. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Fri Nov 1 14:15:56 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 1 Nov 2019 10:15:56 -0400 Subject: [aarch64-port-dev ] RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet In-Reply-To: <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> References: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> Message-ID: >> >> I would like to centralize the decision making into >> ShenandoahBarrierSet, so them can be consistent and easy to maintain. > > You should say, at the start of every routine you touch, which > registers are inputs, which are outputs, and (important) which may > alias with rscratch1 and rscratch2. Please also mark clobbers of > rscratch1 and 2. > Okay, updated: Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233339/webrev.01/index.html Thanks, -Zhengyu From shade at redhat.com Fri Nov 1 15:43:20 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 1 Nov 2019 16:43:20 +0100 Subject: [aarch64-port-dev ] RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet In-Reply-To: References: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> Message-ID: <0fb9cd70-0a89-8c14-7469-55205c4c3808@redhat.com> On 11/1/19 3:15 PM, Zhengyu Gu wrote: > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233339/webrev.01/index.html To be honest, it does not look like much of the improvement from the first glance. Maybe we should massage the code a bit to make it more readable? Roman also needs to take a look. *) shenandoahBarrierSetAssembler_x86.cpp, I believe it would be more straightforward to save branching on local variable "need_load_reference_barrier" by spelling out the "disabled" path directly (in fact, I think you are almost there in shenandoahBarrierSetC1.cpp!): if (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, type)) { BarrierSetAssembler::load_at(masm, decorators, type, dst, src, tmp1, tmp_thread); return; } ... code that assumes need_load_reference_barrier = true follows ... Register result_dst = dst; bool use_tmp1_for_dst = false; *) shenandoahBarrierSetC1.cpp: local variable "need_load_reference_barrier" is not needed, there is only a single use *) shenandoahBarrierSetC2.cpp: this block should go all the way up: 557 if (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, type)) { 558 return load; 559 } *) shenandoahBarrierSet.cpp: this is just "return is_reference_type(type)". Saves some inversions. 78 if (!is_reference_type(type)) return false; 79 return true; *) shenandoahBarrierSet.cpp: should be "Should be subset of LRB": 83 assert(need_load_reference_barrier(decorators, type), "Why ask?"); *) shenandoahBarrierSet.cpp: seems like this assert is subsumed by the previous one? 84 assert(is_reference_type(type), "Why we here?"); -- Thanks, -Aleksey From zgu at redhat.com Fri Nov 1 17:37:49 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 1 Nov 2019 13:37:49 -0400 Subject: [aarch64-port-dev ] RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet In-Reply-To: <0fb9cd70-0a89-8c14-7469-55205c4c3808@redhat.com> References: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> <0fb9cd70-0a89-8c14-7469-55205c4c3808@redhat.com> Message-ID: Hi Aleksey, On 11/1/19 11:43 AM, Aleksey Shipilev wrote: > On 11/1/19 3:15 PM, Zhengyu Gu wrote: >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233339/webrev.01/index.html > > To be honest, it does not look like much of the improvement from the first glance. Maybe we should > massage the code a bit to make it more readable? Roman also needs to take a look. Right, it is not. But I believe that should be done in separate CR, as it may cause backport headache, right? Filed: https://bugs.openjdk.java.net/browse/JDK-8233401 Matter of fact, I would like to hold off this code review, till reactor is done. Thanks, -Zhengyu > > *) shenandoahBarrierSetAssembler_x86.cpp, I believe it would be more straightforward to save > branching on local variable "need_load_reference_barrier" by spelling out the "disabled" path > directly (in fact, I think you are almost there in shenandoahBarrierSetC1.cpp!): > > if (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, type)) { > BarrierSetAssembler::load_at(masm, decorators, type, dst, src, tmp1, tmp_thread); > return; > } > > ... code that assumes need_load_reference_barrier = true follows ... > > Register result_dst = dst; > bool use_tmp1_for_dst = false; > > *) shenandoahBarrierSetC1.cpp: local variable "need_load_reference_barrier" is not needed, there is > only a single use > > *) shenandoahBarrierSetC2.cpp: this block should go all the way up: > > 557 if (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, type)) { > 558 return load; > 559 } > > *) shenandoahBarrierSet.cpp: this is just "return is_reference_type(type)". Saves some inversions. > > 78 if (!is_reference_type(type)) return false; > 79 return true; > > *) shenandoahBarrierSet.cpp: should be "Should be subset of LRB": > > 83 assert(need_load_reference_barrier(decorators, type), "Why ask?"); > > *) shenandoahBarrierSet.cpp: seems like this assert is subsumed by the previous one? > > 84 assert(is_reference_type(type), "Why we here?"); > > From ci_notify at linaro.org Fri Nov 1 21:07:36 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Fri, 1 Nov 2019 21:07:36 +0000 (UTC) Subject: [aarch64-port-dev ] Linaro OpenJDK AArch64 jdk/jdk build 2277 Failure Message-ID: <1322569899.11330.1572642456920.JavaMail.javamailuser@localhost> OpenJDK AArch64 jdk/jdk build status is Failure Build details - https://ci.linaro.org/job/jdkX-ci-build/2277/ Changes - kbarrett: 4ec9fc2b2f0deeb8eabbb816269f5a7f6484be3e - src/hotspot/share/memory/operator_new.cpp --"8233359: Add global sized operator delete definitions Summary: Added new definitions. Reviewed-by: dholmes " bpb: 76638c631869b6da4a7b116dac9a17438cc819c7 - src/java.base/share/classes/java/nio/file/FileStore.java - src/java.base/unix/classes/sun/nio/fs/UnixFileStore.java - src/java.base/windows/classes/sun/nio/fs/WindowsFileStore.java --"8162520: (fs) FileStore should support file stores with > Long.MAX_VALUE capacity Reviewed-by: alanb, darcy, rriggs " Build output - Building target 'images' in configuration '/home/buildslave/workspace/jdkX-ci-build/build' Compiling 8 files for BUILD_TOOLS_LANGTOOLS Compiling 1 files for BUILD_JFR_TOOLS Creating hotspot/variant-server/tools/adlc/adlc from 13 file(s) Compiling 2 files for BUILD_JVMTI_TOOLS Compiling 10 properties into resource bundles for jdk.javadoc Parsing 2 properties into enum-like class for jdk.compiler Compiling 19 properties into resource bundles for jdk.compiler Compiling 12 properties into resource bundles for jdk.jdeps Compiling 7 properties into resource bundles for jdk.jshell Compiling 117 files for BUILD_java.compiler.interim Creating support/modules_libs/java.base/server/libjvm.so from 1006 file(s) Creating hotspot/variant-server/libjvm/gtest/libjvm.so from 126 file(s) Creating hotspot/variant-server/libjvm/gtest/gtestLauncher from 1 file(s) Compiling 401 files for BUILD_jdk.compiler.interim /home/buildslave/workspace/jdkX-ci-build/jdkX/src/hotspot/share/memory/operator_new.cpp:92:6: error: 'void operator delete(void*, size_t)' is a usual (non-placement) deallocation function in C++14 (or with -fsized-deallocation) [-Werror=c++14-compat] void operator delete(void* p, size_t size) throw() { ^ /home/buildslave/workspace/jdkX-ci-build/jdkX/src/hotspot/share/memory/operator_new.cpp:96:6: error: 'void operator delete [](void*, size_t)' is a usual (non-placement) deallocation function in C++14 (or with -fsized-deallocation) [-Werror=c++14-compat] void operator delete [](void* p, size_t size) throw() { ^ cc1plus: error: unrecognized command line option '-Wno-cast-function-type' [-Werror] cc1plus: error: unrecognized command line option '-Wno-misleading-indentation' [-Werror] cc1plus: error: unrecognized command line option '-Wno-implicit-fallthrough' [-Werror] cc1plus: error: unrecognized command line option '-Wno-int-in-bool-context' [-Werror] cc1plus: all warnings being treated as errors lib/CompileJvm.gmk:176: recipe for target '/home/buildslave/workspace/jdkX-ci-build/build/hotspot/variant-server/libjvm/objs/operator_new.o' failed make[3]: *** [/home/buildslave/workspace/jdkX-ci-build/build/hotspot/variant-server/libjvm/objs/operator_new.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make/Main.gmk:268: recipe for target 'hotspot-server-libs' failed make[2]: *** [hotspot-server-libs] Error 1 make[2]: *** Waiting for unfinished jobs.... Compiling 218 files for BUILD_jdk.javadoc.interim ERROR: Build failed for target 'images' in configuration '/home/buildslave/workspace/jdkX-ci-build/build' (exit code 2) === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_operator_new.o: /home/buildslave/workspace/jdkX-ci-build/jdkX/src/hotspot/share/memory/operator_new.cpp:92:6: error: 'void operator delete(void*, size_t)' is a usual (non-placement) deallocation function in C++14 (or with -fsized-deallocation) [-Werror=c++14-compat] void operator delete(void* p, size_t size) throw() { ^ /home/buildslave/workspace/jdkX-ci-build/jdkX/src/hotspot/share/memory/operator_new.cpp:96:6: error: 'void operator delete [](void*, size_t)' is a usual (non-placement) deallocation function in C++14 (or with -fsized-deallocation) [-Werror=c++14-compat] void operator delete [](void* p, size_t size) throw() { ^ cc1plus: error: unrecognized command line option '-Wno-cast-function-type' [-Werror] cc1plus: error: unrecognized command line option '-Wno-misleading-indentation' [-Werror] cc1plus: error: unrecognized command line option '-Wno-implicit-fallthrough' [-Werror] cc1plus: error: unrecognized command line option '-Wno-int-in-bool-context' [-Werror] cc1plus: all warnings being treated as errors * All command lines available in /home/buildslave/workspace/jdkX-ci-build/build/make-support/failure-logs. === End of repeated output === === Make failed targets repeated here === lib/CompileJvm.gmk:176: recipe for target '/home/buildslave/workspace/jdkX-ci-build/build/hotspot/variant-server/libjvm/objs/operator_new.o' failed make/Main.gmk:268: recipe for target 'hotspot-server-libs' failed === End of repeated output === Hint: Try searching the build log for the name of the first failed target. Hint: See doc/building.html#troubleshooting for assistance. /home/buildslave/workspace/jdkX-ci-build/jdkX/make/Init.gmk:307: recipe for target 'main' failed make[1]: *** [main] Error 1 /home/buildslave/workspace/jdkX-ci-build/jdkX/make/Init.gmk:186: recipe for target 'images' failed make: *** [images] Error 2 From ci_notify at linaro.org Sat Nov 2 02:42:18 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Sat, 2 Nov 2019 02:42:18 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <2110647221.11377.1572662539493.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/305/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/16 pass: 5,726 Build 1: aarch64/2019/sep/18 pass: 5,727 Build 2: aarch64/2019/sep/20 pass: 5,728 Build 3: aarch64/2019/sep/23 pass: 5,727 Build 4: aarch64/2019/oct/07 pass: 5,750 Build 5: aarch64/2019/oct/09 pass: 5,747; fail: 1 Build 6: aarch64/2019/oct/11 pass: 5,751; fail: 1 Build 7: aarch64/2019/oct/14 pass: 5,753 Build 8: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 9: aarch64/2019/oct/18 pass: 5,760 Build 10: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 11: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 12: aarch64/2019/oct/28 pass: 5,766 Build 13: aarch64/2019/oct/30 pass: 5,768 Build 14: aarch64/2019/nov/01 pass: 5,768; fail: 1 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/16 pass: 8,687; fail: 501; error: 21 Build 1: aarch64/2019/sep/18 pass: 8,675; fail: 517; error: 18 Build 2: aarch64/2019/sep/20 pass: 8,685; fail: 503; error: 22 Build 3: aarch64/2019/sep/23 pass: 8,696; fail: 497; error: 19 Build 4: aarch64/2019/oct/07 pass: 8,683; fail: 517; error: 18 Build 5: aarch64/2019/oct/09 pass: 8,692; fail: 507; error: 21 Build 6: aarch64/2019/oct/11 pass: 8,693; fail: 511; error: 18 Build 7: aarch64/2019/oct/14 pass: 8,706; fail: 497; error: 20 Build 8: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 9: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 10: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 11: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 12: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 13: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 14: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 2 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/16 pass: 3,978 Build 1: aarch64/2019/sep/18 pass: 3,978 Build 2: aarch64/2019/sep/20 pass: 3,979 Build 3: aarch64/2019/sep/23 pass: 3,979 Build 4: aarch64/2019/oct/07 pass: 3,979 Build 5: aarch64/2019/oct/09 pass: 3,979 Build 6: aarch64/2019/oct/11 pass: 3,979 Build 7: aarch64/2019/oct/14 pass: 3,979 Build 8: aarch64/2019/oct/16 pass: 3,979 Build 9: aarch64/2019/oct/18 pass: 3,979 Build 10: aarch64/2019/oct/21 pass: 3,979 Build 11: aarch64/2019/oct/23 pass: 3,980 Build 12: aarch64/2019/oct/28 pass: 3,980 Build 13: aarch64/2019/oct/30 pass: 3,980 Build 14: aarch64/2019/nov/01 pass: 3,980 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 8.09x Relative performance: Server critical-jOPS (nc): 10.05x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 210.67 Server 210.67 / Server 2014-04-01 (71.00): 2.97x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-09-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/259/results/ 2019-09-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/261/results/ 2019-09-21 pass rate: 10487/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/263/results/ 2019-09-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/266/results/ 2019-10-08 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/280/results/ 2019-10-10 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/282/results/ 2019-10-12 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/284/results/ 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From ci_notify at linaro.org Sat Nov 2 13:36:40 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Sat, 2 Nov 2019 13:36:40 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 8u on AArch64 Message-ID: <284063212.11475.1572701801018.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk8u/openjdk-jtreg-nightly-tests/summary/2019/306/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/19 pass: 814; fail: 20; error: 4 Build 1: aarch64/2019/jul/25 pass: 802; fail: 25; error: 11 Build 2: aarch64/2019/jul/30 pass: 787; fail: 40; error: 11 Build 3: aarch64/2019/aug/01 pass: 800; fail: 26; error: 12 Build 4: aarch64/2019/aug/04 pass: 808; fail: 30; error: 2 Build 5: aarch64/2019/aug/06 pass: 799; fail: 29; error: 12 Build 6: aarch64/2019/aug/08 pass: 830; fail: 9; error: 1 Build 7: aarch64/2019/aug/11 pass: 825; fail: 14; error: 1 Build 8: aarch64/2019/aug/13 pass: 830; fail: 9; error: 1 Build 9: aarch64/2019/aug/15 pass: 837; fail: 9; error: 1 Build 10: aarch64/2019/aug/17 pass: 837; fail: 9; error: 1 Build 11: aarch64/2019/aug/22 pass: 837; fail: 9; error: 1 Build 12: aarch64/2019/sep/10 pass: 838; fail: 13; error: 1 Build 13: aarch64/2019/sep/21 pass: 838; fail: 13; error: 1 Build 14: aarch64/2019/nov/02 pass: 843; fail: 9; error: 1 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/19 pass: 5,940; fail: 278; error: 22 Build 1: aarch64/2019/jul/25 pass: 5,938; fail: 276; error: 26 Build 2: aarch64/2019/jul/30 pass: 5,942; fail: 273; error: 25 Build 3: aarch64/2019/aug/01 pass: 5,945; fail: 271; error: 24 Build 4: aarch64/2019/aug/04 pass: 5,949; fail: 270; error: 24 Build 5: aarch64/2019/aug/06 pass: 5,945; fail: 275; error: 23 Build 6: aarch64/2019/aug/08 pass: 5,953; fail: 267; error: 23 Build 7: aarch64/2019/aug/11 pass: 5,947; fail: 272; error: 25 Build 8: aarch64/2019/aug/13 pass: 5,962; fail: 258; error: 24 Build 9: aarch64/2019/aug/15 pass: 5,955; fail: 266; error: 23 Build 10: aarch64/2019/aug/17 pass: 5,951; fail: 269; error: 24 Build 11: aarch64/2019/aug/22 pass: 5,945; fail: 279; error: 20 Build 12: aarch64/2019/sep/10 pass: 5,951; fail: 273; error: 23 Build 13: aarch64/2019/sep/21 pass: 5,964; fail: 261; error: 22 Build 14: aarch64/2019/nov/02 pass: 5,956; fail: 278; error: 18 1 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/19 pass: 3,116; fail: 2 Build 1: aarch64/2019/jul/25 pass: 3,116; fail: 2 Build 2: aarch64/2019/jul/30 pass: 3,116; fail: 2 Build 3: aarch64/2019/aug/01 pass: 3,116; fail: 2 Build 4: aarch64/2019/aug/04 pass: 3,116; fail: 2 Build 5: aarch64/2019/aug/06 pass: 3,116; fail: 2 Build 6: aarch64/2019/aug/08 pass: 3,116; fail: 2 Build 7: aarch64/2019/aug/11 pass: 3,116; fail: 2 Build 8: aarch64/2019/aug/13 pass: 3,116; fail: 2 Build 9: aarch64/2019/aug/15 pass: 3,116; fail: 2 Build 10: aarch64/2019/aug/17 pass: 3,116; fail: 2 Build 11: aarch64/2019/aug/22 pass: 3,116; fail: 2 Build 12: aarch64/2019/sep/10 pass: 3,116; fail: 2 Build 13: aarch64/2019/sep/21 pass: 3,116; fail: 2 Build 14: aarch64/2019/nov/02 pass: 3,116; fail: 2 Previous results can be found here: http://openjdk.linaro.org/jdk8u/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 6.89x Relative performance: Server critical-jOPS (nc): 8.59x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk8u/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 178.67 Server 178.67 / Server 2014-04-01 (71.00): 2.52x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk8u/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-07-20 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/200/results/ 2019-07-26 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/206/results/ 2019-07-31 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/211/results/ 2019-08-02 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/213/results/ 2019-08-05 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/216/results/ 2019-08-07 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/218/results/ 2019-08-09 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/220/results/ 2019-08-12 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/223/results/ 2019-08-13 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/225/results/ 2019-08-16 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/227/results/ 2019-08-17 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/229/results/ 2019-08-23 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/234/results/ 2019-09-11 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/253/results/ 2019-09-22 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/264/results/ 2019-11-02 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/306/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/ From zgu at redhat.com Sat Nov 2 15:07:31 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Sat, 2 Nov 2019 11:07:31 -0400 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code Message-ID: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> Please review this refactor of Shenandoah load barrier. The goal is to make the barrier structurally similar cross interpreter, C1 and C2, improve readability and maintainability. Bug: https://bugs.openjdk.java.net/browse/JDK-8233401 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.00/index.html Test: hotspot_gc_shenandoah (fastdebug and release) x86_64 and x86_32 on Linux AArch64 on Linux Thanks, -Zhengyu From shade at redhat.com Mon Nov 4 09:10:29 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 4 Nov 2019 10:10:29 +0100 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> Message-ID: On 11/2/19 4:07 PM, Zhengyu Gu wrote: > Please review this refactor of Shenandoah load barrier. The goal is to make the barrier structurally > similar cross interpreter, C1 and C2, improve readability and maintainability. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233401 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.00/index.html This is cute patch. *) Typo "non-reference load": 207 // 1: none-reference load, no additional barrier is needed *) The comment style is inconsistent with other places: 537 Node* ShenandoahBarrierSetC2::load_at_resolved(C2Access& access, const Type* val_type) const { 538 // 1: load reference 539 Node* load = BarrierSetC2::load_at_resolved(access, val_type); 540 // For none-reference load, no additional barrier is needed *) In constructions like this, it seems more consistent to introduce the local variable for matching the decorator? 387 // Native barrier is for concurrent root processing 388 if (((decorators & IN_NATIVE) != 0) && 389 ShenandoahConcurrentRoots::can_do_concurrent_roots()) { Otherwise looks good. Roman needs to take a look as well. -- Thanks, -Aleksey From aph at redhat.com Mon Nov 4 09:44:34 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Nov 2019 09:44:34 +0000 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> Message-ID: On 11/2/19 3:07 PM, Zhengyu Gu wrote: > Please review this refactor of Shenandoah load barrier. The goal is to > make the barrier structurally similar cross interpreter, C1 and C2, > improve readability and maintainability. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233401 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.00/index.html > > Test: > hotspot_gc_shenandoah (fastdebug and release) > x86_64 and x86_32 on Linux > AArch64 on Linux Thanks, this is an improvement. However, it's still weird. // // Arguments: // // Inputs: // src: oop location to load from, might be clobbered // tmp1: unused // tmp_thread: unused // // Output: // dst: oop loaded from src location // // Kill: // rscratch1 (scratch reg) // // Alias: // dst: rscratch1 (might use rscratch1 as temporary output register to avoid clobbering src) // void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, Register dst, Address src, Register tmp1, Register tmp_thread) { tmp1 and tmp_thread are unused? It'd be a good idea, then, to say if they are safe to use or not. Or maybe even better do this if you want to keep the same arg list: void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, Register dst, Address src, Register, Register) { I guess it really isn't safe to use "tmp1" as a tmp, regardless of its name. If so, better pass it as noreg/ -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Mon Nov 4 11:42:01 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 4 Nov 2019 11:42:01 +0000 Subject: [aarch64-port-dev ] RFR (trivial) : fix aarch64-8u type profile bug In-Reply-To: <8ad0aa09-bfb1-c891-e17a-be7d14b3a2ae@redhat.com> References: <222f9c0b-7320-8d22-cd44-c4f3af7c1311@redhat.com> <880f5072-91ba-66bd-94be-429556e7c132@redhat.com> <8ad0aa09-bfb1-c891-e17a-be7d14b3a2ae@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Thursday, October 17, 2019 9:06 PM > To: Yangfei (Felix) ; > aarch64-port-dev at openjdk.java.net > Cc: hotspot-runtime-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR (trivial) : fix aarch64-8u type profile bug > > On 9/26/19 2:59 AM, Yangfei (Felix) wrote: > > CCing to hotspot-runtime-dev list. > > > > This has passed hotspot jtreg test on aarch64-linux. Is it OK to go? > > I'll have a look. Hi, I opened a new bug for this: https://bugs.openjdk.java.net/browse/JDK-8233466 Webrev: http://cr.openjdk.java.net/~fyang/8233466/webrev.00/ Passed tier1-3 testing. Is it OK to go? Thanks, Felix From zgu at redhat.com Mon Nov 4 14:08:42 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 4 Nov 2019 09:08:42 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> Message-ID: Hi Andrew, Thanks for the review. > void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, > Register dst, Address src, Register tmp1, Register tmp_thread) { > > tmp1 and tmp_thread are unused? It'd be a good idea, then, to say if they are > safe to use or not. Or maybe even better do this if you want to keep the same > arg list: > > void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, > Register dst, Address src, Register, Register) { > This is an overrode method. What you get for tmp1 and tmp_thread, is really platform dependent. On AArch64, you usually get noreg for tmp1 and tmp_thread. I can not tell if you can safely use tmp1 if it is valid. I don't use tmp1 here, since I don't think it is worth the trouble, as we have spare scratch registers. I do use tmp1 in x86 through. What do you suggest the comment should be? Thanks, -Zhengyu > I guess it really isn't safe to use "tmp1" as a tmp, regardless of its name. > > If so, better pass it as noreg/ > From aph at redhat.com Mon Nov 4 14:32:36 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Nov 2019 14:32:36 +0000 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> Message-ID: <6ff66df6-cba8-e2a3-30ba-0ba5656e15fb@redhat.com> On 11/4/19 2:08 PM, Zhengyu Gu wrote: > >> void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, >> Register dst, Address src, Register tmp1, Register tmp_thread) { >> >> tmp1 and tmp_thread are unused? It'd be a good idea, then, to say if they are >> safe to use or not. Or maybe even better do this if you want to keep the same >> arg list: >> >> void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, >> Register dst, Address src, Register, Register) { >> > > This is an overrode method. What you get for tmp1 and tmp_thread, is > really platform dependent. > > On AArch64, you usually get noreg for tmp1 and tmp_thread. I can not > tell if you can safely use tmp1 if it is valid. > > I don't use tmp1 here, since I don't think it is worth the trouble, as > we have spare scratch registers. I do use tmp1 in x86 through. OK, so please just do this for now: >> void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, >> Register dst, Address src, Register, Register) { I'm working on a redesign of the way that scratch registers are used in AArch64, and this code is likely to have to be changed. Accurate information about register usage is likely to be crucial for that. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From thomas.stuefe at gmail.com Mon Nov 4 15:21:32 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Nov 2019 16:21:32 +0100 Subject: [aarch64-port-dev ] Fwd: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: References: Message-ID: Hi, could some aarch64 people please take a quick look at this small patch? The aarch64 part is really tiny, but I have no possibility to test this. Last webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 Thank you, Thomas ---------- Forwarded message --------- From: Thomas St?fe Date: Wed, Oct 30, 2019 at 11:47 AM Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit To: hotspot compiler Cc: Doerr, Martin , Schmidt, Lutz < lutz.schmidt at sap.com> Hi all, second attempt at a fix (please find first review thread here: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html ) Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/ In short, C1 intrinsic for jlC::isPrimitive does a compare with the Klass* pointer for the class to find out if its NULL and hence a primitive type. That compare is done using 32bit cmp and so it gives wrong results when the Klass* pointer is aligned to 32bit. In the generator I changed the comparison constant type from intConst(0) to metadataConst(0) and implemented the missing code paths for all CPUs. Since on most architectures we do not seem to have a comparison with a 64bit immediate (at least I could not find one) I kept the change simple and only implemented comparison with NULL for now. I tested the fix in our nightlies (jtreg tier1, jck and others) as well as manually testing it. I did not test on aarch64 and arm though and would be thankful if someone knowledgeable to these platforms could take a look. Thanks to Martin and Lutz for eyeballing the ppc and s390 parts. Thanks, Thomas From rkennke at redhat.com Mon Nov 4 15:35:52 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 4 Nov 2019 16:35:52 +0100 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> Message-ID: >> Please review this refactor of Shenandoah load barrier. The goal is to make the barrier structurally >> similar cross interpreter, C1 and C2, improve readability and maintainability. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233401 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.00/index.html > > This is cute patch. > > *) Typo "non-reference load": > > 207 // 1: none-reference load, no additional barrier is needed > > *) The comment style is inconsistent with other places: > > 537 Node* ShenandoahBarrierSetC2::load_at_resolved(C2Access& access, const Type* val_type) const { > 538 // 1: load reference > 539 Node* load = BarrierSetC2::load_at_resolved(access, val_type); > 540 // For none-reference load, no additional barrier is needed > > *) In constructions like this, it seems more consistent to introduce the local variable for matching > the decorator? > > 387 // Native barrier is for concurrent root processing > 388 if (((decorators & IN_NATIVE) != 0) && > 389 ShenandoahConcurrentRoots::can_do_concurrent_roots()) { > > Otherwise looks good. Roman needs to take a look as well. Yes, otherwise looks good. Thanks, Roman From rkennke at redhat.com Mon Nov 4 15:54:10 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 4 Nov 2019 16:54:10 +0100 Subject: [aarch64-port-dev ] [8u] 8231366: Shenandoah: Shenandoah String Dedup thread is not properly initialized In-Reply-To: References: Message-ID: Looks good to me. Thanks, Roman > This bug seems to exist since day one of 8u backport. The > ConcurrentGCThread API is different in 8u and we leave > ShenandoahDedupThread not properly initialized before it enters work loop. > > In Shenandoah String Deduplication tests, the bug results assertion > failure that shows Thread::current() == NULL. > > The bug only manifests on Windows, is due to discrepancy of java_start() > implementation on different OSs. e.g. it sets *thread* on Linux. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231366 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231366/webrev.00/ > > Test: > ? hotspot_gc_shenandoah on Windows and Linux. > > Thanks, > > -Zhengyu > From aph at redhat.com Mon Nov 4 17:04:22 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Nov 2019 17:04:22 +0000 Subject: [aarch64-port-dev ] Fwd: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: References: Message-ID: <8850def2-0d1f-6aa1-17ac-1b9dd4d50a34@redhat.com> On 11/4/19 3:21 PM, Thomas St?fe wrote: > Hi, > > could some aarch64 people please take a quick look at this small patch? > > The aarch64 part is really tiny, but I have no possibility to test this. > > Last webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ > Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 Looking. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Nov 4 17:22:03 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Nov 2019 17:22:03 +0000 Subject: [aarch64-port-dev ] Fwd: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: References: Message-ID: On 11/4/19 3:21 PM, Thomas St?fe wrote: > could some aarch64 people please take a quick look at this small patch? > > The aarch64 part is really tiny, but I have no possibility to test this. > > Last webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ > Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 Seems fine. Before: ;; block B0 [0, 4] 0x0000ffffa1d93f54: ldr x0, [x1, #80] ; implicit exception: dispatches to 0x0000ffffa1d93f78 0x0000ffffa1d93f58: cmp w0, #0x0 0x0000ffffa1d93f5c: cset x0, eq // eq = none ;*invokevirtual isPrimitive {reexecute=0 rethrow=0 return_oop=0} ; - IsPrimitiveTest::isPrimitive at 1 (line 4) After: ;; block B0 [0, 4] 0x0000ffff71dc75d4: ldr x0, [x1, #80] ; implicit exception: dispatches to 0x0000ffff71dc75f8 0x0000ffff71dc75d8: cmp x0, #0x0 0x0000ffff71dc75dc: cset x0, eq // eq = none ;*invokevirtual isPrimitive {reexecute=0 rethrow=0 return_oop=0} ; - IsPrimitiveTest::isPrimitive at 1 (line 4) i.e. the first test is "cmp w0, #0x0", the second is "cmp x0, #0x0". The first is a 32-bit comparison, the second 64-bit. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Mon Nov 4 17:32:11 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 4 Nov 2019 12:32:11 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <6ff66df6-cba8-e2a3-30ba-0ba5656e15fb@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <6ff66df6-cba8-e2a3-30ba-0ba5656e15fb@redhat.com> Message-ID: >> On AArch64, you usually get noreg for tmp1 and tmp_thread. I can not >> tell if you can safely use tmp1 if it is valid. >> >> I don't use tmp1 here, since I don't think it is worth the trouble, as >> we have spare scratch registers. I do use tmp1 in x86 through. > > OK, so please just do this for now: Thanks! -Zhengyu > >>> void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, >>> Register dst, Address src, Register, Register) { > > I'm working on a redesign of the way that scratch registers are used in > AArch64, and this code is likely to have to be changed. Accurate information > about register usage is likely to be crucial for that. > From zgu at redhat.com Mon Nov 4 17:33:20 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 4 Nov 2019 12:33:20 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> Message-ID: <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.01/index.html Okay now? Thanks, -Zhengyu On 11/4/19 10:35 AM, Roman Kennke wrote: >>> Please review this refactor of Shenandoah load barrier. The goal is to make the barrier structurally >>> similar cross interpreter, C1 and C2, improve readability and maintainability. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233401 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.00/index.html >> >> This is cute patch. >> >> *) Typo "non-reference load": >> >> 207 // 1: none-reference load, no additional barrier is needed >> >> *) The comment style is inconsistent with other places: >> >> 537 Node* ShenandoahBarrierSetC2::load_at_resolved(C2Access& access, const Type* val_type) const { >> 538 // 1: load reference >> 539 Node* load = BarrierSetC2::load_at_resolved(access, val_type); >> 540 // For none-reference load, no additional barrier is needed >> >> *) In constructions like this, it seems more consistent to introduce the local variable for matching >> the decorator? >> >> 387 // Native barrier is for concurrent root processing >> 388 if (((decorators & IN_NATIVE) != 0) && >> 389 ShenandoahConcurrentRoots::can_do_concurrent_roots()) { >> >> Otherwise looks good. Roman needs to take a look as well. > > Yes, otherwise looks good. > > Thanks, > Roman > > From aph at redhat.com Mon Nov 4 17:38:14 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Nov 2019 17:38:14 +0000 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> Message-ID: <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> On 11/4/19 5:33 PM, Zhengyu Gu wrote: > Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.01/index.html > > Okay now? AArch64 still says void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, Register dst, Address src, Register tmp1, Register tmp_thread) { instead of void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, Register dst, Address src, Register, Register) { -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Mon Nov 4 18:18:38 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 4 Nov 2019 13:18:38 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> Message-ID: <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> On 11/4/19 12:38 PM, Andrew Haley wrote: > On 11/4/19 5:33 PM, Zhengyu Gu wrote: >> Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.01/index.html >> >> Okay now? > AArch64 still says > > void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, > Register dst, Address src, Register tmp1, Register tmp_thread) { > > instead of > > void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, > Register dst, Address src, Register, Register) { They are still needed for calling super class's load_at(). Even though, they are not used there neither. // 1: non-reference load, no additional barrier is needed if (!is_reference_type(type) ) { BarrierSetAssembler::load_at(masm, decorators, type, dst, src, tmp1, tmp_thread); return; } -Zhengyu > From zgu at redhat.com Mon Nov 4 18:23:12 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 4 Nov 2019 13:23:12 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> Message-ID: <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> On 11/4/19 1:18 PM, Zhengyu Gu wrote: > > > On 11/4/19 12:38 PM, Andrew Haley wrote: >> On 11/4/19 5:33 PM, Zhengyu Gu wrote: >>> Updated: >>> http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.01/index.html >>> >>> Okay now? >> AArch64 still says >> >> ? void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, >> DecoratorSet decorators, BasicType type, >> ????????????????????????????????????????????? Register dst, Address >> src, Register tmp1, Register tmp_thread) { >> >> instead of >> >> ? void ShenandoahBarrierSetAssembler::load_at(MacroAssembler* masm, >> DecoratorSet decorators, BasicType type, >> ????????????????????????????????????????????? Register dst, Address >> src, Register, Register) { > > They are still needed for calling super class's load_at(). Even though, > they are not used there neither. Or I should say, they are not used there right now, but may be used in future ... -Zhengyu > > ? // 1: non-reference load, no additional barrier is needed > ? if (!is_reference_type(type) ) { > ??? BarrierSetAssembler::load_at(masm, decorators, type, dst, src, > tmp1, tmp_thread); > ??? return; > ? } > > > -Zhengyu > >> From thomas.stuefe at gmail.com Mon Nov 4 18:24:22 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Nov 2019 19:24:22 +0100 Subject: [aarch64-port-dev ] Fwd: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: References: Message-ID: Thanks Andrew! On Mon, Nov 4, 2019, 18:22 Andrew Haley wrote: > On 11/4/19 3:21 PM, Thomas St?fe wrote: > > could some aarch64 people please take a quick look at this small patch? > > > > The aarch64 part is really tiny, but I have no possibility to test this. > > > > Last webrev: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ > > Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 > > Seems fine. > > Before: > > ;; block B0 [0, 4] > 0x0000ffffa1d93f54: ldr x0, [x1, #80] ; implicit > exception: dispatches to 0x0000ffffa1d93f78 > 0x0000ffffa1d93f58: cmp w0, #0x0 > 0x0000ffffa1d93f5c: cset x0, eq // eq = none > ;*invokevirtual isPrimitive {reexecute=0 rethrow=0 return_oop=0} > ; - > IsPrimitiveTest::isPrimitive at 1 (line 4) > > After: > > ;; block B0 [0, 4] > 0x0000ffff71dc75d4: ldr x0, [x1, #80] ; implicit > exception: dispatches to 0x0000ffff71dc75f8 > 0x0000ffff71dc75d8: cmp x0, #0x0 > 0x0000ffff71dc75dc: cset x0, eq // eq = none > ;*invokevirtual isPrimitive {reexecute=0 rethrow=0 return_oop=0} > ; - > IsPrimitiveTest::isPrimitive at 1 (line 4) > > i.e. the first test is "cmp w0, #0x0", the second is "cmp x0, #0x0". > The first is a 32-bit comparison, the second 64-bit. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > From patrick at os.amperecomputing.com Tue Nov 5 01:39:29 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Tue, 5 Nov 2019 01:39:29 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: References: Message-ID: Reformat the description below. Please help review, thanks. Regards Patrick -----Original Message----- From: aarch64-port-dev On Behalf Of Patrick Zhang OS Sent: Tuesday, October 29, 2019 5:59 PM To: aarch64-port-dev at openjdk.java.net Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable Hi, Could you please review this patch, thanks. JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 Webrev: https://cr.openjdk.java.net/~qpzhang/8229351/webrev.02 (this starts from .02 since there had been some internal review and updates) Changes: 1. Split the STUB_THRESHOLD from the hard-coded 72 to be CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more flexible control over the stub thresholds for string_compare intrinsics, especially for various uArchs. 2. MacroAssembler::string_compare LL and UU shared the same threshold, actually UU may only require the half (length of chars) of that of LL's, because one character has two-bytes for UU, while for compacted LL strings, one character means one byte. In addition, LU/UL may need a separated threshold, as the stub function is different from the same encoding one, and the performance may vary as well. 3. In generate_compare_long_string_same_encoding, the hard-coded 72 was originally able to ensure that there can be always 64 bytes at least for the prefetch code path. However once a smaller stub threshold is set, a new condition is needed to tell if this would be still valid, or has to go to the NO_PREFETCH branch. This change can ensure the correctness. 4. In generate_compare_long_string_different_encoding, some temp vars for handling the last 4 characters are not valid any longer, cleaned up strU and strL, and related pointers initialization to the next U (cnt1) and L (tmp2). 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not needed, as tmpU or tmpL point to the same register. Tests: 1. For function check, I have run jdk jtreg tier1 tests, with default vm flags hotspot jtreg tests: runtime/compiler/gc parts, with "-Xcomp -XX:-TieredCompilation" jck10/api/java.lang 1609 cases and other selected modules, no new failures found, with default vm flags and "-Xcomp -XX:-TieredCompilation" respectively; some specific test cases had been carefully executed to double check, i.e., TestStringCompareToDifferentLength.java [1] and TestStringCompareToDifferentLength.java [1] introduced by [2], StrCmpTest.java [3] introduced by [4]. 2. For performance check, I have run string-density-bench/CompareToBench.java [5] and StringCompareBench.java [6] respectively, and SPECjbb2015.jar, no obvious performance change has been found (since the default threshold is NOT changed within this patch). FYI. with Ampere eMAG system, microbenchmarks [5][6] can have 1.5x consistent perf gain with LU/UL comparison for shorter strings (<72 chars, smaller stub thresholds), and slight improvement (5~10%) with LL/UU cases. Refs: [1] https://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string [2] https://bugs.openjdk.java.net/browse/JDK-8218966 AArch64: String.compareTo() can read memory after string [3] https://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java [4] https://bugs.openjdk.java.net/browse/JDK-8202326 AARCH64: optimize string compare intrinsic [5] https://cr.openjdk.java.net/~shade/density/string-density-bench.jar [6] https://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java Regards Patrick From ci_notify at linaro.org Tue Nov 5 03:04:02 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Tue, 5 Nov 2019 03:04:02 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <368223805.11982.1572923043042.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/308/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/18 pass: 5,727 Build 1: aarch64/2019/sep/20 pass: 5,728 Build 2: aarch64/2019/sep/23 pass: 5,727 Build 3: aarch64/2019/oct/07 pass: 5,750 Build 4: aarch64/2019/oct/09 pass: 5,747; fail: 1 Build 5: aarch64/2019/oct/11 pass: 5,751; fail: 1 Build 6: aarch64/2019/oct/14 pass: 5,753 Build 7: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 8: aarch64/2019/oct/18 pass: 5,760 Build 9: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 10: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 11: aarch64/2019/oct/28 pass: 5,766 Build 12: aarch64/2019/oct/30 pass: 5,768 Build 13: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 14: aarch64/2019/nov/04 pass: 5,769 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/18 pass: 8,675; fail: 517; error: 18 Build 1: aarch64/2019/sep/20 pass: 8,685; fail: 503; error: 22 Build 2: aarch64/2019/sep/23 pass: 8,696; fail: 497; error: 19 Build 3: aarch64/2019/oct/07 pass: 8,683; fail: 517; error: 18 Build 4: aarch64/2019/oct/09 pass: 8,692; fail: 507; error: 21 Build 5: aarch64/2019/oct/11 pass: 8,693; fail: 511; error: 18 Build 6: aarch64/2019/oct/14 pass: 8,706; fail: 497; error: 20 Build 7: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 8: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 9: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 10: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 11: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 12: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 13: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 14: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 2 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/18 pass: 3,978 Build 1: aarch64/2019/sep/20 pass: 3,979 Build 2: aarch64/2019/sep/23 pass: 3,979 Build 3: aarch64/2019/oct/07 pass: 3,979 Build 4: aarch64/2019/oct/09 pass: 3,979 Build 5: aarch64/2019/oct/11 pass: 3,979 Build 6: aarch64/2019/oct/14 pass: 3,979 Build 7: aarch64/2019/oct/16 pass: 3,979 Build 8: aarch64/2019/oct/18 pass: 3,979 Build 9: aarch64/2019/oct/21 pass: 3,979 Build 10: aarch64/2019/oct/23 pass: 3,980 Build 11: aarch64/2019/oct/28 pass: 3,980 Build 12: aarch64/2019/oct/30 pass: 3,980 Build 13: aarch64/2019/nov/01 pass: 3,980 Build 14: aarch64/2019/nov/04 pass: 3,980 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 8.34x Relative performance: Server critical-jOPS (nc): 10.57x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 201.64 Server 201.64 / Server 2014-04-01 (71.00): 2.84x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-09-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/261/results/ 2019-09-21 pass rate: 10487/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/263/results/ 2019-09-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/266/results/ 2019-10-08 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/280/results/ 2019-10-10 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/282/results/ 2019-10-12 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/284/results/ 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From felix.yang at huawei.com Tue Nov 5 06:20:40 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 5 Nov 2019 06:20:40 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations Message-ID: Hi, Please review this small improvements of aarch64 atomic operations. This eliminates the use of full memory barriers. Passed tier1-3 testing. Patch: diff -r 2700c409ff10 src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp --- a/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Sun Nov 03 18:02:29 2019 -0500 +++ b/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 06 14:13:00 2019 +0800 @@ -40,8 +40,7 @@ { template D add_and_fetch(I add_value, D volatile* dest, atomic_memory_order order) const { - D res = __atomic_add_fetch(dest, add_value, __ATOMIC_RELEASE); - FULL_MEM_BARRIER; + D res = __atomic_add_fetch(dest, add_value, __ATOMIC_ACQ_REL); return res; } }; @@ -52,8 +51,7 @@ T volatile* dest, atomic_memory_order order) const { STATIC_ASSERT(byte_size == sizeof(T)); - T res = __sync_lock_test_and_set(dest, exchange_value); - FULL_MEM_BARRIER; + T res = __atomic_exchange_n(dest, exchange_value, __ATOMIC_ACQ_REL); return res; } From aph at redhat.com Tue Nov 5 08:52:20 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 5 Nov 2019 08:52:20 +0000 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> Message-ID: <6c110878-a477-df8a-e566-84b113806044@redhat.com> On 11/4/19 6:23 PM, Zhengyu Gu wrote: >> They are still needed for calling super class's load_at(). Even though, >> they are not used there neither. Aha! Sorry, I missed that. > Or I should say, they are not used there right now, but may be used in > future ... So add them in the future, surely. All you're doing by passing unused args is confusing the reader. It definitely succeeded with me... -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Tue Nov 5 12:24:02 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 5 Nov 2019 12:24:02 +0000 Subject: [aarch64-port-dev ] RFD: scratch registers cleanup [Long post] In-Reply-To: <3e4aaf79-59a9-b346-e6b0-69839acf723e@redhat.com> References: <3e4aaf79-59a9-b346-e6b0-69839acf723e@redhat.com> Message-ID: <47b66917-cd4c-d3bf-5e48-f7054fe45eec@redhat.com> Hi Andrew, This is a good start and it could be used as is. However,I think it needs some polishing to improve the way it works. That could involve upgrading the model for how scratch registers are declared, allocated, released and re-allocated. Alternatively, it could just involve improving the way the current definitions are employed. I'll start with general comments which address that broader point before going on to mention specific issues to do with the current patch. These latter comments will also help to motivate the general point but I think it is better to make them in context. See after the sig for these comments. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< --- I think this is a very good start to remedying a nasty problem. However, I'm not yet convinced the model for how you manage and use scratch registers is the best way to do this. I'll explain why below and suggest a variation that might or might not work. If it doesn't then correct me and that will at least help understand why you ended up with the current design. If it does work or, at least, suggests how to move towards something better then let's try another round. What troubles me most of all is that the mechanism you have provided can be quite opaque about ownership/liveness of registers. I think that happens because at root it does not require that declaration + allocation, release and subsequent re-allocation are co-located in the same program scope (or,at least, in the same method). Let's work through some examples to clarify that view. As it stands it is possible to allocate some scratch registers in a caller method and then release those registers in a called method (possibly more than one call down the stack) -- indeed, the latter operation occurs in this code from the patch: c1_LIRAssembler_aarch64.cpp: 960 void LIR_Assembler::mem2reg(LIR_Opr src, LIR_Opr dest, BasicType type, LIR_PatchCode patch_code, CodeEmitInfo* info, bool wide, bool /* unaligned */) { 961 LIR_Address* addr = src->as_address_ptr(); . . . 976 int null_check_here = code_offset(); 977 FreeScratchRegs dummy(__ as()); . . . ote that we are not in the scope of a visible ScratchRegister declaration here. So, the use fo a FreeScratchRegs declaration implies that we may be in the scope of a declaration up the call chain. If that may be the case then how can the callee safely cancel that allocation? How can it be sure that the caller is not relying on a scratch register retaining its value across the call. Likewise from the POV of the caller which might have declared a ScratchRegister there is nothing in the code (except perhaps for comments) to indicate that said scratch register might be overwritten under the call to this method (or perhaps even a call to an indirect caller). To those who do not know the details of the callee a ScratchRegister declared in the caller will appear to remain valid across the call even though it may be overwritten. Another example occurs later in the same file 1464 void LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck* op) { 1465 const bool should_profile = op->should_profile(); 1466 1467 LIR_Code code = op->code(); 1468 if (code == lir_store_check) { . . . 1495 if (should_profile) { 1496 Label not_null; 1497 __ cbnz(value, not_null); 1498 // Object is null; update MDO and exit 1499 ScratchRegister rscratch1(__ as(), r8); 1500 ScratchRegister rscratch2(__ as(), r9); . . . . . . 1554 } else if (code == lir_checkcast) { 1555 FreeScratchRegs dummy(__ as()); . . . 1564 } else if (code == lir_instanceof) { 1565 FreeScratchRegs dummy(__ as()); . . . Once again the release operations occur in a scope where there is no prior declaration in the same scope. So, this use of release declarations in the else-if clauses implies that a caller may have allocated a scratch register yet they presume it is safe to release it. Meanwhile, to make matters worse, the ScratchRegister declarations in the initial if clause imply that a caller which exercises this path will /not/ have declared a scratch register. That further implies that the validity of these combined assumptions about what declarations might be in scope depends on the callee's conditional logic. Now that's starting to get a tad too opaque and I fear we are heading down a garden path. Of course, this may just be down to the fact that the inclusion of those release operations was superfluous; that they could be removed from these examples. However, I don't like the fact that one can construct such examples. I would much prefer it if this usage was avoided where possible by requiring the release to occur in the scope where the scratch register was declared. i.e. the caller must arrange to free scratch registers if a caller might need them. That would make it much easier to detect circumstances where the caller was trying to have its cake and eat it. This model of usage is indeed what happens much of the time e.g. also from the same file: 1280 void LIR_Assembler::type_profile_helper(Register mdo, 1281 ciMethodData *md, ciProfileData *data, 1282 Register recv, Label* update_done) { 1283 ScratchRegister rscratch1(__ as(), r8); 1284 ScratchRegister rscratch2(__ as(), r9); . . . 1293 { 1294 FreeScratchRegs dummy(__ as()); . . . 1298 } . . . With this usage it is very clear that any scratch value established before line 1294 cannot be relied on in code after line 1298. I suspect that there are circumstances where a caller can or even must free scratch registers down the call tree from a allocation so perhaps we cannot afford always to dispense with this current usage for FreeScratchRegs altogether. However, I'd prefer it if an explicit scope-local release was used for wherever possible to avoid this 2nd-guessing of callers' intentions. In which case, it would be much better if we could use the grammar of language definitions to support that .. . . . . which also provides the opportunity to remedy another shortcoming. The current release operation does not specify which scratch register(s) is (are) to be released. The only option is free all registers. However, if a release operation must always be located in the scope of a ScratchRegister then it is perfectly possible to do so by passing the relevant ScratchRegister as a parameter. So, perhaps we could use macros like: FreeScratchRegister dummy(rscratch1) FreeScratchRegisters dummy(rscratch1, rscratch2); That makes it harder for scratch registers to be freed down the stack in a caller because the caller will not have access to rscratch1 and rscratch2 unless they are passed in by reerence or pointer. I'm assuming code won't pass a ScratchRegister as a call argument, merely the encapsulated Register (we might perhaps be able to knobble that by ensuring that copying/dereferencing reset the register to noreg). If this revised model works then the current FreeScratchRegs really needs renaming to something less appealing like DeleteCallerScratchBindings that won't tempt anyone to use it. Also, it would perhaps be useful to provide a reverse operation for re-allocating a previouly declared and freed scratch register e.g. public void Foo::bar(...) ScratchRegister rscratch1(...); ScratchRegister rscratch2(...); . . . . . . FreeScratchRegister dummy(rscratch1); call_user_of_rscratch1(...); UseScratchRegister dummy2(rscratch1); ... This explicitly acknowledges that the call may want to use rsctratch1 and explicitly signals in the caller code that rscratch1 cannot be expected to remain constant across the call. Of course, one can always achieve the same effect with block nesting but I think this is neater. There is a bit more to this to do with conditional allocation of a scratch register that I will discuss in the detailed comments (for c1_LIRAssembler) but that needs to be done in context so ... ... on to the specific comments: aarch64.ad: Firstly, just a humdrum error. You missed a conversion for a specific use of rscratch1 2851 enc_class aarch64_enc_stlrb(iRegI src, memory mem) %{ 2852 MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, 2853 rscratch1, stlrb); 2854 %} This is still passing rscratch1 in the MOV_VOLATILE macro call where subsequent encoding classes use r8_scratch. It doesn't actually matter (i.e. it doesn't fail to compile) because MOV_VOLATILE ignores the SCRATCH argument. I know you are trying to minimize changes but dropping that argument in all uses would be a whole lot better . . . yes, maybe in the next patch. This story continues . . . 2930 enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ 2931 MOV_VOLATILE(r8_scratch, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, 2932 r8_scratch, ldarw); 2933 __ fmovs(as_FloatRegister($dst$$reg), r8_scratch); 2934 %} I'm not sure why you invented the name r8_scratch as an alias for r8. Does that really help? I couldn't discern why you sometimes used one and why sometimes the other. If there is a rationale it probaly ought to be documented somewhere (at least at the point where r8_scratch aand r9_scratch are declared). Anyway, what I don't follow is why are these uses just employ a bare register name. Why are they exempt from the need to declare a scratch register for r8 before using it? Why not: enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ ScratchRegister rscratch1(__ as(), r8); MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, ldarw); __ fmovs(as_FloatRegister($dst$$reg), rscratch1); . . . or, if need be enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ { ScratchRegister rscratch1(__ as(), r8); MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, ldarw); __ fmovs(as_FloatRegister($dst$$reg), rscratch1); } . . . I realize this is extra 'protocol' for cases where register use is managed by other means and where the invoked assembler methods are not going to risk reuse of the register. However, the bare use of r8 (or even under the name r8_scratch) is actually quite confusing. Using the same protocol for acquiring a scratch register has the important virtue of consistency. It also allows us to find a way later to avoid having to explicitly name r8 and r9 in the declaration and, equally, having to explicitly name them in these encodings e.g. we might just pass an index in the constructor: ScratchRegister rscratch1(__ as(), 1); ScratchRegister rscratch2(__ as(), 2); This is not the only place where your usage is confusing. You do (although not consistently) use declarations in some of the later encodings. For example, further down the file: 3526 enc_class aarch64_enc_fast_lock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ . . . 3570 // markWord of object (disp_hdr) with the stack pointer. 3571 __ mov(r8_scratch, sp); 3572 __ sub(disp_hdr, disp_hdr, r8_scratch); . . . has no declaration ... but is immediately followed by 3604 enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ . . . 3644 ScratchRegister rscratch1(__ as(), r8); . . . Declarations are also used in some of the instruction definitions: 12103 instruct rolI_rReg(iRegINoSp dst, iRegI src, iRegI shift, rFlagsReg cr) . . . 12109 ins_encode %{ 12110 ScratchRegister rscratch1(__ as(), r8); 12111 __ subw(rscratch1, zr, as_Register($shift$$reg)); . . . Is there a clear rationale for whether or not to declare a ScratchRegister that I am missing? I'd be happier with them being used everywhere, avoiding explicit mention of Register names at points of use. Also, one other thing I don't understand but I think is just an error: 3008 enc_class aarch64_enc_stlxr(iRegLNoSp src, memory mem) %{ 3009 MacroAssembler _masm(&cbuf); 3010 Register src_reg = as_Register($src$$reg); 3011 Register base = as_Register($mem$$base); 3012 ScratchRegister rscratch2(__ as(), r9); . . . 3017 if (disp != 0) { 3018 __ lea(r9_scratch, Address(base, disp)); 3019 __ stlxr(r8_scratch, src_reg, r9_scratch); . . . Why are you declaring r9 as temp rscratch2, not declaring r8 as rscratch1 and then using both as r9_scratch and r8_scratch? Note also that in the previous encoding (aarch64_enc_ldaxr) you use r9_scratch and r8_scratch and don't declare any ScratchRegister. Oh and 3117 enc_class aarch64_enc_prefetchw(memory mem) %{ . . . 3130 ScratchRegister r8_scratch(__ as(), r8); 3131 __ lea(r8_scratch, Address(base, disp)); 3132 __ prfm(Address(r8_scratch, index_reg, Address::lsl(scale)), PSTL1KEEP); Why name the scratch register r8_scratch in this decl (shadowing an existing name) rather than calling it rscratch1? c1_LIRAssembler_aarch64.cpp: I found the comment here to be misleading 1581 void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) { 1582 // r8 is used to pass an argument here, not as scratch. See 1583 // LIRGenerator::atomic_cmpxchg. 1584 __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true, /* release*/ true, /* weak*/ false, r8); 1585 __ cset(r8, Assembler::NE); 1586 __ membar(__ AnyAny); 1587 } Firstly, to be more strict/precise, r8 is used to return a result from cmpxchg in order to then pass it on to cset (it read to me like you were saying that it passes a value into cmpxchg). More importantly, what the comment does not clarify is that the reason you cannot allocate r8 as a scratch register here at the point of call with a ScratchRegister decl is because cmpxchg itself conditionally reserves r8 as a scratch register in the case where a client passes noreg. So using a ScratchRegister here would break cmpxchg and not using a ScratchRegister in cmpxchg would disallow other callers from passing in noreg or require it to provide two paths to handle the two cases, noreg or a scratch reg, allocating a ScratchRegister in only one of those paths. This is all really a bit clumsy and certainly unclear. This sort of case where a called method conditionally allocates a scratch register is an important one to be able to handle. I think a way to deal with this might be to allow methods like cmpxchg to declare a local scratch register which /uses/ a supplied register if provided and only /allocates/ a specific scratch register if noreg is supplied. That would allow the caller to allocate a scratch register and pass it in or not allocate the register and pass in noreg. Of course it also allows a caller to pass in a non-scratch register. So, given the above definition and your current definition for cmpxchg, viz: 2497 void MacroAssembler::cmpxchg(Register addr, Register expected, 2498 Register new_val, 2499 enum operand_size size, 2500 bool acquire, bool release, 2501 bool weak, 2502 Register result) { 2503 ScratchRegister rscratch1(as(), r8); 2504 if (result == noreg) result = rscratch1; we could replace them with void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) { // allocate scratch register to return result // and pass it on to cset ScratchRegister rscratch1( __ as(), r8); __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true, /* release*/ true, /* weak*/ false, rscratch1); __ cset(rscratch1, Assembler::NE); __ membar(__ AnyAny); } and void MacroAssembler::cmpxchg(Register addr, Register expected, Register new_val, enum operand_size size, bool acquire, bool release, bool weak, Register result) { ScratchRegister rscratch1(as(), /*use without alloc if reg*/ result, /*alloc if noreg*/ r8); This doesn't really work quite right because the caller has to inhibit allocation in the called method either by passing 1) an explicit non-scratch register or 2) an allocated scratch register i.e. it is not quite uniform. However, in cases where noreg is passed the implication is clear that that rscratch1 (possibly also rscratch2 if two noreg args may be passed) will be allocated by the callee as an alternative. n.b. the comment in the original version of LIR_Assembler::casw refers to LIRGenerator::atomic_cmpxchg. Do you not actually mean MacroAssembler::cmpxchg? c1_Runtime_aarch64.cpp I don't really like this: 51 int StubAssembler::call_RT(Register oop_result1, Register metadata_result, address entry, int args_size) { 52 // setup registers . . . 58 FreeScratchRegs dummy(as()); 59 ScratchRegister rscratch1(as(), r8); 60 ScratchRegister rscratch2(as(), r9); Why are the regs freed here rather than in the caller? I realise this case is special because the callee can know that scratch regs are now invalid (since we are about to plant a blr all bets are off). However, this is a disparity with other use cases where the caller needs to make the decision to free temps. Intentions and expectations would be clearer if the caller were required to explicitly release scratch vars before a call to call_RT (and then maybe reallocate them again afterwards). interp_masm_aarch64.cpp: You have these top level declarations: 47 static const Register rscratch1 = r8; 48 static const Register rscratch2 = r9; So, we don't (now or in future) use ScratchRegister here? Why not? Or is this just an expedient hack? Once again, I'd prefer scratch uses to be uniform across all the code. interp_masm_aarch64.hpp: 293 set_last_Java_frame(esp, rfp, (address) pc(), r8); So, here you are using r8 rather than r8_scratch. Was there a rationale for that? interpreterRT_aarch64.cpp: 41 static const Register rscratch1 = r8; 42 static const Register rscratch2 = r9; Same comment as above for interp_masm_aarch64.cpp. macroAssembler_aarch64.cpp: Several more occurrences of callee freeing that make sense given that a call is being planted but which I think would be clearer if done in the relevant callers: 679 void MacroAssembler::call_VM_base(Register oop_result, 680 Register java_thread, 681 Register last_java_sp, 682 address entry_point, 683 int number_of_arguments, 684 bool check_exceptions) { 685 FreeScratchRegs regs(as()); . . . 804 address MacroAssembler::emit_trampoline_stub(int insts_call_instruction_offset, 805 address dest) { . . . 821 FreeScratchRegs dummy(as()); . . . 1456 void MacroAssembler::call_VM_leaf_base(address entry_point, 1457 int number_of_arguments, 1458 Label *retaddr) { 1459 Label E, L; 1460 1461 FreeScratchRegs dummy(as()); // VM calls clobber all registers but 1462 // we preserve rscratch1. 1463 ScratchRegister rscratch1(as(), r8); . . . Also, I don't follow why you sometimes use r8 and other times r9_scratch in this file. methodHandles_aarch64.cpp Once again free in a callee that I'd prefer to see done in the callers: 97 void MethodHandles::jump_from_method_handle(MacroAssembler* _masm, Register method, Register temp, 98 bool for_compiler_entry) { 99 FreeScratchRegs dummy(__ as()); 100 ScratchRegister rscratch1(__ as(), r8); 128 void MethodHandles::jump_to_lambda_form(MacroAssembler* _masm, 129 Register recv, Register method_temp, 130 Register temp2, 131 bool for_compiler_entry) { 132 FreeScratchRegs(__ as()); stubGenerator_aarch64.cpp Two cases where it is clear a more selective release would be useful 2104 { 2105 FreeScratchRegs dummy(__ as()); 2106 ScratchRegister rscratch2(__ as(), r9); . . . 2188 { 2189 FreeScratchRegs dummy(__ as()); 2190 ScratchRegister rscratch2(__ as(), r9); templateTable_aarch64.cpp 46 static const Register rscratch1 = r8; 47 static const Register rscratch2 = r9; Same comment as above for interp_masm_aarch64.cpp. From zgu at redhat.com Tue Nov 5 12:33:19 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 5 Nov 2019 07:33:19 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <6c110878-a477-df8a-e566-84b113806044@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> <6c110878-a477-df8a-e566-84b113806044@redhat.com> Message-ID: <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> On 11/5/19 3:52 AM, Andrew Haley wrote: > On 11/4/19 6:23 PM, Zhengyu Gu wrote: >>> They are still needed for calling super class's load_at(). Even though, >>> they are not used there neither. > > Aha! Sorry, I missed that. > >> Or I should say, they are not used there right now, but may be used in >> future ... > > So add them in the future, surely. All you're doing by passing unused > args is confusing the reader. It definitely succeeded with me... > Sorry, I should just remove 'unused' comments. Okay with you? Thanks, -Zhengyu From aph at redhat.com Tue Nov 5 14:58:33 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 5 Nov 2019 14:58:33 +0000 Subject: [aarch64-port-dev ] RFD: scratch registers cleanup [Long post] In-Reply-To: <47b66917-cd4c-d3bf-5e48-f7054fe45eec@redhat.com> References: <3e4aaf79-59a9-b346-e6b0-69839acf723e@redhat.com> <47b66917-cd4c-d3bf-5e48-f7054fe45eec@redhat.com> Message-ID: <957a57a3-2cdb-f199-0f74-9a9795cab3d7@redhat.com> On 11/5/19 12:24 PM, Andrew Dinn wrote: > I think this is a very good start to remedying a nasty problem. However, > I'm not yet convinced the model for how you manage and use scratch > registers is the best way to do this. I'll explain why below and suggest > a variation that might or might not work. If it doesn't then correct me > and that will at least help understand why you ended up with the current > design. If it does work or, at least, suggests how to move towards > something better then let's try another round. > > What troubles me most of all is that the mechanism you have provided can > be quite opaque about ownership/liveness of registers. I think that > happens because at root it does not require that declaration + > allocation, release and subsequent re-allocation are co-located in the > same program scope (or,at least, in the same method). > > Let's work through some examples to clarify that view. As it stands it > is possible to allocate some scratch registers in a caller method and > then release those registers in a called method (possibly more than one > call down the stack) -- indeed, the latter operation occurs in this code > from the patch: Absolutely so, yes. Note that at this point I am not trying to reorganize code but to make is clear(er) what is going on, and when a programmer does something random to force that programmer to declare that randomness. Abuses are possible, I agree. > c1_LIRAssembler_aarch64.cpp: > > 960 void LIR_Assembler::mem2reg(LIR_Opr src, LIR_Opr dest, BasicType > type, LIR_PatchCode patch_code, CodeEmitInfo* info, bool wide, bool /* > unaligned */) { > 961 LIR_Address* addr = src->as_address_ptr(); > . . . > 976 int null_check_here = code_offset(); > 977 FreeScratchRegs dummy(__ as()); > . . . > > ote that we are not in the scope of a visible ScratchRegister > declaration here. So, the use fo a FreeScratchRegs declaration implies > that we may be in the scope of a declaration up the call chain. If that > may be the case then how can the callee safely cancel that allocation? > How can it be sure that the caller is not relying on a scratch register > retaining its value across the call. It can't. "Naked" FreeScratchRegs are an abomination to be be avoided wherever possible. The only really justified use of them is when we're making a callout to runtime code, at which point the programmer is expected to know that the native ABI will clobber the scratch regs. I don't want to clutter every single call to the runtime with FreeScratchRegs. I don't think it would help anyone. > Likewise from the POV of the caller which might have declared a > ScratchRegister there is nothing in the code (except perhaps for > comments) to indicate that said scratch register might be overwritten > under the call to this method (or perhaps even a call to an indirect > caller). To those who do not know the details of the callee a > ScratchRegister declared in the caller will appear to remain valid > across the call even though it may be overwritten. > > Another example occurs later in the same file > > 1464 void LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck* op) { > 1465 const bool should_profile = op->should_profile(); > 1466 > 1467 LIR_Code code = op->code(); > 1468 if (code == lir_store_check) { > . . . > 1495 if (should_profile) { > 1496 Label not_null; > 1497 __ cbnz(value, not_null); > 1498 // Object is null; update MDO and exit > 1499 ScratchRegister rscratch1(__ as(), r8); > 1500 ScratchRegister rscratch2(__ as(), r9); . . . > . . . > 1554 } else if (code == lir_checkcast) { > 1555 FreeScratchRegs dummy(__ as()); > . . . > 1564 } else if (code == lir_instanceof) { > 1565 FreeScratchRegs dummy(__ as()); > . . . > > Once again the release operations occur in a scope where there is no > prior declaration in the same scope. So, this use of release > declarations in the else-if clauses implies that a caller may have > allocated a scratch register yet they presume it is safe to release it. That one looks like a mistake. It's perfectly possible to FreeScratchRegs where none are allocated. I don't believe that any registers will be allocated before the start of emit_opTypeCheck(). > Meanwhile, to make matters worse, the ScratchRegister declarations in > the initial if clause imply that a caller which exercises this path will > /not/ have declared a scratch register. That further implies that the > validity of these combined assumptions about what declarations might be > in scope depends on the callee's conditional logic. Now that's starting > to get a tad too opaque and I fear we are heading down a garden path. Indeed. > Of course, this may just be down to the fact that the inclusion of those > release operations was superfluous; that they could be removed from > these examples. However, I don't like the fact that one can construct > such examples. I would much prefer it if this usage was avoided where > possible by requiring the release to occur in the scope where the > scratch register was declared. i.e. the caller must arrange to free > scratch registers if a caller might need them. That would make it much > easier to detect circumstances where the caller was trying to have its > cake and eat it. Sure, I'm happy with that in all cases except in the special case of the runtime calls. > This model of usage is indeed what happens much of the time e.g. also > from the same file: > > 1280 void LIR_Assembler::type_profile_helper(Register mdo, > 1281 ciMethodData *md, > ciProfileData *data, > 1282 Register recv, Label* > update_done) { > 1283 ScratchRegister rscratch1(__ as(), r8); > 1284 ScratchRegister rscratch2(__ as(), r9); > . . . > 1293 { > 1294 FreeScratchRegs dummy(__ as()); > . . . > 1298 } > . . . > > With this usage it is very clear that any scratch value established > before line 1294 cannot be relied on in code after line 1298. That's what it's supposed to look like! I don't really believe that it makes sense for us to prevent "Naked" FreeScratchRegs automagically, but it would be good to reject such things in code review > I suspect that there are circumstances where a caller can or even > must free scratch registers down the call tree from a allocation so > perhaps we cannot afford always to dispense with this current usage > for FreeScratchRegs altogether. However, I'd prefer it if an > explicit scope-local release was used for wherever possible to avoid > this 2nd-guessing of callers' intentions. In which case, it would be > much better if we could use the grammar of language definitions to > support that .. . Now that's a good idea. I'm not sure how you'd actually enforce it in C++, but you could have something like CalloutFreeScratchRegs for callouts. Actually, maybe I *can* think of a way to do it with macro magic. > . . . which also provides the opportunity to remedy another shortcoming. > The current release operation does not specify which scratch register(s) > is (are) to be released. The only option is free all registers. However, > if a release operation must always be located in the scope of a > ScratchRegister then it is perfectly possible to do so by passing the > relevant ScratchRegister as a parameter. So, perhaps we could use macros > like: > > FreeScratchRegister dummy(rscratch1) > > FreeScratchRegisters dummy(rscratch1, rscratch2); Sure, it is, but IMO it's an over-elaboration. When calling down to a sub-macro it's easer to follow what's going on if you just say "this macro clobbers scratch", but I won't fight you if you're really keen to do that. > That makes it harder for scratch registers to be freed down the stack in > a caller because the caller will not have access to rscratch1 and > rscratch2 unless they are passed in by reerence or pointer. I'm assuming > code won't pass a ScratchRegister as a call argument, merely the > encapsulated Register (we might perhaps be able to knobble that by > ensuring that copying/dereferencing reset the register to noreg). Yeah. Passing scratch registers by other names is one of the worst abuses I've had to deal with, and after some version of this patch goes in such things may be fixed. > If this revised model works then the current FreeScratchRegs really > needs renaming to something less appealing like > DeleteCallerScratchBindings that won't tempt anyone to use it. That's a nice idea. > Also, it would perhaps be useful to provide a reverse operation for > re-allocating a previouly declared and freed scratch register e.g. > > public void Foo::bar(...) > ScratchRegister rscratch1(...); > ScratchRegister rscratch2(...); > . . . > > . . . > FreeScratchRegister dummy(rscratch1); > call_user_of_rscratch1(...); > UseScratchRegister dummy2(rscratch1); > ... No: too complicated, excessive API service. Using nested scopes for allocation and release corresponds will with the usage of C++, and that's a good thing. > This explicitly acknowledges that the call may want to use rsctratch1 > and explicitly signals in the caller code that rscratch1 cannot be > expected to remain constant across the call. Of course, one can always > achieve the same effect with block nesting but I think this is neater. I don't. That's what nesting is for! > There is a bit more to this to do with conditional allocation of a > scratch register that I will discuss in the detailed comments (for > c1_LIRAssembler) but that needs to be done in context so ... > > ... on to the specific comments: > > aarch64.ad: > > Firstly, just a humdrum error. You missed a conversion for a specific > use of rscratch1 > > 2851 enc_class aarch64_enc_stlrb(iRegI src, memory mem) %{ > 2852 MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index, > $mem$$scale, $mem$$disp, > 2853 rscratch1, stlrb); > 2854 %} > > This is still passing rscratch1 in the MOV_VOLATILE macro call where > subsequent encoding classes use r8_scratch. It doesn't actually matter > (i.e. it doesn't fail to compile) because MOV_VOLATILE ignores the > SCRATCH argument. I know you are trying to minimize changes but dropping > that argument in all uses would be a whole lot better . . . yes, maybe > in the next patch. Yes. > This story continues . . . > > 2930 enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ > 2931 MOV_VOLATILE(r8_scratch, $mem$$base, $mem$$index, $mem$$scale, > $mem$$disp, > 2932 r8_scratch, ldarw); > 2933 __ fmovs(as_FloatRegister($dst$$reg), r8_scratch); > 2934 %} > > I'm not sure why you invented the name r8_scratch as an alias for r8. > Does that really help? I couldn't discern why you sometimes used one and > why sometimes the other. If there is a rationale it probaly ought to be > documented somewhere (at least at the point where r8_scratch aand > r9_scratch are declared). I'm not sure, which is perhaps why it wasn't used consistently. Just a reminder, I guess. > Anyway, what I don't follow is why are these uses just employ a bare > register name. Why are they exempt from the need to declare a scratch > register for r8 before using it? > Why not: > > enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ > ScratchRegister rscratch1(__ as(), r8); > MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, > $mem$$disp, > rscratch1, ldarw); > __ fmovs(as_FloatRegister($dst$$reg), rscratch1); > . . . > > or, if need be > > enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ > { > ScratchRegister rscratch1(__ as(), r8); > MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, > $mem$$disp, > rscratch1, ldarw); > __ fmovs(as_FloatRegister($dst$$reg), rscratch1); > } > . . . I guess I could live with that. I don't want to introduce extra noise where it can be avoided. > This is not the only place where your usage is confusing. You do > (although not consistently) use declarations in some of the later > encodings. For example, further down the file: > > 3526 enc_class aarch64_enc_fast_lock(iRegP object, iRegP box, iRegP > tmp, iRegP tmp2) %{ > . . . > 3570 // markWord of object (disp_hdr) with the stack pointer. > 3571 __ mov(r8_scratch, sp); > 3572 __ sub(disp_hdr, disp_hdr, r8_scratch); > . . . > > has no declaration ... but is immediately followed by > > 3604 enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP > tmp, iRegP tmp2) %{ > . . . > 3644 ScratchRegister rscratch1(__ as(), r8); > . . . Fair enough. It's PoC, WIP. :-) > Declarations are also used in some of the instruction definitions: > > 12103 instruct rolI_rReg(iRegINoSp dst, iRegI src, iRegI shift, > rFlagsReg cr) > . . . > 12109 ins_encode %{ > 12110 ScratchRegister rscratch1(__ as(), r8); > 12111 __ subw(rscratch1, zr, as_Register($shift$$reg)); > . . . > > Is there a clear rationale for whether or not to declare a > ScratchRegister that I am missing? I'd be happier with them being used > everywhere, avoiding explicit mention of Register names at points of use. No, I might have just changed my mind partway through. In principle, for the sake of the sanity of the maintenance programmer, it might be simplest to insist that scratch register declarations are always used in AD files. It would mean a change which moves the register tracker from Assembler to CodeBuffer, because C2 creates Assemblers on the fly, but such a change is not hard to do. > Also, one other thing I don't understand but I think is just an error: > > 3008 enc_class aarch64_enc_stlxr(iRegLNoSp src, memory mem) %{ > 3009 MacroAssembler _masm(&cbuf); > 3010 Register src_reg = as_Register($src$$reg); > 3011 Register base = as_Register($mem$$base); > 3012 ScratchRegister rscratch2(__ as(), r9); > . . . > 3017 if (disp != 0) { > 3018 __ lea(r9_scratch, Address(base, disp)); > 3019 __ stlxr(r8_scratch, src_reg, r9_scratch); > . . . > > Why are you declaring r9 as temp rscratch2, not declaring r8 as > rscratch1 and then using both as r9_scratch and r8_scratch? Note also > that in the previous encoding (aarch64_enc_ldaxr) you use r9_scratch and > r8_scratch and don't declare any ScratchRegister. I'm sure there was a reason, but... > Why name the scratch register r8_scratch in this decl (shadowing an > existing name) rather than calling it rscratch1? That too. > c1_LIRAssembler_aarch64.cpp: > > I found the comment here to be misleading > > 1581 void LIR_Assembler::casw(Register addr, Register newval, Register > cmpval) { > 1582 // r8 is used to pass an argument here, not as scratch. See > 1583 // LIRGenerator::atomic_cmpxchg. > 1584 __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ > true, /* release*/ true, /* weak*/ false, r8); > 1585 __ cset(r8, Assembler::NE); > 1586 __ membar(__ AnyAny); > 1587 } > > Firstly, to be more strict/precise, r8 is used to return a result from > cmpxchg in order to then pass it on to cset (it read to me like you were > saying that it passes a value into cmpxchg). Passing scratch registers to macros as arguments is really awkward. It's just about the most confusing and error-prone thing you can do. In a later patch I'd like to get rid of all of it. Probably the greatest contribution of this work is that it detects and forces us to do something about all such usages. > More importantly, what the comment does not clarify is that the reason > you cannot allocate r8 as a scratch register here at the point of call > with a ScratchRegister decl is because cmpxchg itself conditionally > reserves r8 as a scratch register in the case where a client passes > noreg. So using a ScratchRegister here would break cmpxchg and not using > a ScratchRegister in cmpxchg would disallow other callers from passing > in noreg or require it to provide two paths to handle the two cases, > noreg or a scratch reg, allocating a ScratchRegister in only one of > those paths. This is all really a bit clumsy and certainly unclear. Right. The problem here is that the way the registers are used is confusing and clumsy; the declarations (and comments) reflect that clumsiness. > This sort of case where a called method conditionally allocates a > scratch register is an important one to be able to handle. I think a > way to deal with this might be to allow methods like cmpxchg to > declare a local scratch register which /uses/ a supplied register if > provided and only /allocates/ a specific scratch register if noreg > is supplied. That would allow the caller to allocate a scratch > register and pass it in or not allocate the register and pass in > noreg. Of course it also allows a caller to pass in a non-scratch > register. Better still, I think, for the caller to allocate the scratch register and pass it down under the name rscratch1; ownership remains with the caller. > So, given the above definition and your current definition for cmpxchg, viz: > > 2497 void MacroAssembler::cmpxchg(Register addr, Register expected, > 2498 Register new_val, > 2499 enum operand_size size, > 2500 bool acquire, bool release, > 2501 bool weak, > 2502 Register result) { > 2503 ScratchRegister rscratch1(as(), r8); > 2504 if (result == noreg) result = rscratch1; > > > we could replace them with > > void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) { > // allocate scratch register to return result > // and pass it on to cset > ScratchRegister rscratch1( __ as(), r8); > __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true, > /* release*/ true, /* weak*/ false, rscratch1); > __ cset(rscratch1, Assembler::NE); > __ membar(__ AnyAny); > } > > and > > void MacroAssembler::cmpxchg(Register addr, Register expected, > Register new_val, > enum operand_size size, > bool acquire, bool release, > bool weak, > Register result) { > ScratchRegister rscratch1(as(), > /*use without alloc if reg*/ result, > /*alloc if noreg*/ r8); > > > This doesn't really work quite right because the caller has to inhibit > allocation in the called method either by passing 1) an explicit > non-scratch register or 2) an allocated scratch register i.e. it is not > quite uniform. However, in cases where noreg is passed the implication > is clear that that rscratch1 (possibly also rscratch2 if two noreg args > may be passed) will be allocated by the callee as an alternative. Hmm, maybe. I suggest that we put up with the ugliness of nakedly using r8 for now (accompanied by suitable comments) and then fix it up later. > n.b. the comment in the original version of LIR_Assembler::casw refers > to LIRGenerator::atomic_cmpxchg. Do you not actually mean > MacroAssembler::cmpxchg? I can't remember. > c1_Runtime_aarch64.cpp > > I don't really like this: > > 51 int StubAssembler::call_RT(Register oop_result1, Register > metadata_result, address entry, int args_size) { > 52 // setup registers > . . . > 58 FreeScratchRegs dummy(as()); > 59 ScratchRegister rscratch1(as(), r8); > 60 ScratchRegister rscratch2(as(), r9); > > Why are the regs freed here rather than in the caller? I realise this > case is special because the callee can know that scratch regs are now > invalid (since we are about to plant a blr all bets are off). However, > this is a disparity with other use cases where the caller needs to make > the decision to free temps. Intentions and expectations would be clearer > if the caller were required to explicitly release scratch vars before a > call to call_RT (and then maybe reallocate them again afterwards). No, I don't want to do that, as I said above. It's a callout, which nukes more than just the scratch registers. > interp_masm_aarch64.cpp: > > You have these top level declarations: > > 47 static const Register rscratch1 = r8; > 48 static const Register rscratch2 = r9; > > So, we don't (now or in future) use ScratchRegister here? Why not? Or is > this just an expedient hack? Once again, I'd prefer scratch uses to be > uniform across all the code. It's very expedient, and in particular the interpreter has its own convention for register usage. I don't believe that rewriting all of the templates etc. to use scratch register declarations would reduce errors but it might even introduce them. I could live with the interpreter being changed later, but there's no need for it now. > interp_masm_aarch64.hpp: > > 293 set_last_Java_frame(esp, rfp, (address) pc(), r8); > > So, here you are using r8 rather than r8_scratch. Was there a rationale > for that? It's because of declaration is in the wrong place. If we moved it, it could just use the same name as the rest of the interpreter. > interpreterRT_aarch64.cpp: > > 41 static const Register rscratch1 = r8; > 42 static const Register rscratch2 = r9; > > Same comment as above for interp_masm_aarch64.cpp. > > > macroAssembler_aarch64.cpp: > > Several more occurrences of callee freeing that make sense given that a > call is being planted but which I think would be clearer if done in the > relevant callers: See above. > Also, I don't follow why you sometimes use r8 and other times > r9_scratch in this file. Some cleanups required. > methodHandles_aarch64.cpp > > Once again free in a callee that I'd prefer to see done in the callers: > > 97 void MethodHandles::jump_from_method_handle(MacroAssembler* _masm, > Register method, Register temp, > 98 bool for_compiler_entry) { > 99 FreeScratchRegs dummy(__ as()); > 100 ScratchRegister rscratch1(__ as(), r8); There's even less justification for putting FreeScratchRegs in the caller at this point: jump_from_method_handle is not going to return. > stubGenerator_aarch64.cpp > > Two cases where it is clear a more selective release would be useful > > 2104 { > 2105 FreeScratchRegs dummy(__ as()); > 2106 ScratchRegister rscratch2(__ as(), r9); > . . . > 2188 { > 2189 FreeScratchRegs dummy(__ as()); > 2190 ScratchRegister rscratch2(__ as(), r9); Maybe, but it would make for a more complicated API, which I'm not convinced would really carry its weight. I can go with a more complicated FreeScratchRegs which says what you've freed rather than what you've reserved if you're really keen. NBut it's not obvious to me that FreeScratchRegs dummy(r8, __ as()); is really better than FreeScratchRegs dummy(__ as()); ScratchRegister rscratch2(__ as(), r9); ... especially since the latter actually relies on the programmer reading upwards to see that r9 is allocated. > templateTable_aarch64.cpp > > 46 static const Register rscratch1 = r8; > 47 static const Register rscratch2 = r9; > > Same comment as above for interp_masm_aarch64.cpp. Same reply. :-) Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Nov 5 15:08:37 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 5 Nov 2019 15:08:37 +0000 Subject: [aarch64-port-dev ] RFD: scratch registers cleanup [Long post] In-Reply-To: <957a57a3-2cdb-f199-0f74-9a9795cab3d7@redhat.com> References: <3e4aaf79-59a9-b346-e6b0-69839acf723e@redhat.com> <47b66917-cd4c-d3bf-5e48-f7054fe45eec@redhat.com> <957a57a3-2cdb-f199-0f74-9a9795cab3d7@redhat.com> Message-ID: <5ff0143f-881e-4e02-13a8-56bd71b0086d@redhat.com> On 11/5/19 2:58 PM, Andrew Haley wrote: > No: too complicated, excessive API service. s/service/surface/ -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Nov 5 16:26:19 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 5 Nov 2019 16:26:19 +0000 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> <6c110878-a477-df8a-e566-84b113806044@redhat.com> <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> Message-ID: <2be52de0-6f12-d989-cf69-5807b2160cb0@redhat.com> On 11/5/19 12:33 PM, Zhengyu Gu wrote: > > > On 11/5/19 3:52 AM, Andrew Haley wrote: >> On 11/4/19 6:23 PM, Zhengyu Gu wrote: >>>> They are still needed for calling super class's load_at(). Even though, >>>> they are not used there neither. >> >> Aha! Sorry, I missed that. >> >>> Or I should say, they are not used there right now, but may be used in >>> future ... >> >> So add them in the future, surely. All you're doing by passing unused >> args is confusing the reader. It definitely succeeded with me... >> > Sorry, I should just remove 'unused' comments. Okay with you? OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Nov 6 09:45:01 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Nov 2019 09:45:01 +0000 Subject: [aarch64-port-dev ] RFR: AArch64: JDK-8232046: AArch64 build failure after JDK-8225681 In-Reply-To: <48557f4d-bd8c-ff22-7c3e-fe8ec3f532dd@redhat.com> References: <41cfc3b9-eb1e-b445-3136-9de93eb66cb2@redhat.com> <87fd08bf-82c0-0bb8-e322-311c878b43b4@redhat.com> <48557f4d-bd8c-ff22-7c3e-fe8ec3f532dd@redhat.com> Message-ID: <97bf1ce3-0cb4-b229-d5b0-c48f0ee84647@redhat.com> On 10/11/19 1:51 PM, Andrew Dinn wrote: > Hi Erik, > > On 11/10/2019 13:04, erik.osterlund at oracle.com wrote: >> Looks good to me. I feel like something is weird about the 0 is >> logically -1 mapping (shouldn't it have populated the generic jump with >> -1 in the first place instead?), but that weirdness should not hold back >> this fix. Ship it. > > Perhaps. Although -1 is not used anywhere else in the AArch64 code -- > all other sites use a self-reference (jump target address == address of > jump) from the get-go as well as after a reset. They then lie > consistently about that to keep the generic code happy. I am not sure > there is any good reason to use that in place of -1 but I always default > to the assumption that Andrew Hayley had a reason for breaking with > protocol. > > So, I'd really have preferred to have used a self-reference as the > initial value in this case too. Indeed, I tried that but it failed to > relocate when the nmethod was installed. When debugging that failure I > spotted a cryptic breadcrumb comment left by Andrew Haley about relocs > not doing the right thing when the generate buffer was copied. So, I > decided to leave well alone at that point. > > This may only be an artefact of Andrew Haley not understanding relocs > fully when he first wrote the code. When he is back I'll talk to him and > see if we can correct this to use a self-reference or event switching > all jumps to use -1 a an empty marker. Even if we can only manage > consistent lying about the -1 that would be an improvement. Getting back to this... >From what I remember, the AArch64 code is based on what x86 did at the time. It might well be that such kludges are no longer necessary. There's no reason not to make this code consistent with all of the other usages. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Nov 6 09:46:24 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Nov 2019 09:46:24 +0000 Subject: [aarch64-port-dev ] RFR (trivial) : fix aarch64-8u type profile bug In-Reply-To: References: <222f9c0b-7320-8d22-cd44-c4f3af7c1311@redhat.com> <880f5072-91ba-66bd-94be-429556e7c132@redhat.com> Message-ID: On 9/24/19 9:22 AM, Yangfei (Felix) wrote: >>> This also reminds me of another two aarch64-specific profiling issues: >>> https://bugs.openjdk.java.net/browse/JDK-8188221 >>> https://bugs.openjdk.java.net/browse/JDK-8189439 >>> >>> I think they also should be incorporated in aarch64 8u. What do you think? >> I've always been reluctant to backport performance-only patches to 8u, but I >> admit that version will be around for a long time, so OK. >> > Looks like the upstream patches can be simplified: 'mdp' is loaded by test_method_data_pointer which is called by profile_return_type & profile_parameters_type. Sorry for not replying before now. Maybe they can, but this is a backport. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Wed Nov 6 12:18:28 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 6 Nov 2019 07:18:28 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <2be52de0-6f12-d989-cf69-5807b2160cb0@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> <6c110878-a477-df8a-e566-84b113806044@redhat.com> <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> <2be52de0-6f12-d989-cf69-5807b2160cb0@redhat.com> Message-ID: <93330192-7143-ca82-9872-fe627a97772e@redhat.com> On 11/5/19 11:26 AM, Andrew Haley wrote: > On 11/5/19 12:33 PM, Zhengyu Gu wrote: >> >> >> On 11/5/19 3:52 AM, Andrew Haley wrote: >>> On 11/4/19 6:23 PM, Zhengyu Gu wrote: >>>>> They are still needed for calling super class's load_at(). Even though, >>>>> they are not used there neither. >>> >>> Aha! Sorry, I missed that. >>> >>>> Or I should say, they are not used there right now, but may be used in >>>> future ... >>> >>> So add them in the future, surely. All you're doing by passing unused >>> args is confusing the reader. It definitely succeeded with me... >>> >> Sorry, I should just remove 'unused' comments. Okay with you? > > OK. Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.02/index.html Thanks, -Zhengyu > From shade at redhat.com Wed Nov 6 12:18:58 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 6 Nov 2019 13:18:58 +0100 Subject: [aarch64-port-dev ] RFR (XS) 8233695: AArch64 build failures after -Wno-extra removal Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8233695 Fix: https://cr.openjdk.java.net/~shade/8233695/webrev.01/ This is the simplest fix I can think of in orderAccess: cast away constness. Also, trivially unused parameter in templateInterpreterGenerator_aarch64.cpp. The issues is exposed in 14, but it is actually there in all releases down to 8-aarch64. Testing: aarch64 build, tier1 (running) -- Thanks, -Aleksey From aph at redhat.com Wed Nov 6 12:33:29 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Nov 2019 12:33:29 +0000 Subject: [aarch64-port-dev ] RFR (XS) 8233695: AArch64 build failures after -Wno-extra removal In-Reply-To: References: Message-ID: <46aa9d40-7353-c5b6-6bd9-4152eee0a3b8@redhat.com> On 11/6/19 12:18 PM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8233695 > > Fix: > https://cr.openjdk.java.net/~shade/8233695/webrev.01/ > > This is the simplest fix I can think of in orderAccess: cast away constness. Also, trivially unused > parameter in templateInterpreterGenerator_aarch64.cpp. The issues is exposed in 14, but it is > actually there in all releases down to 8-aarch64. It's better to use const_cast here. Otherwise OK, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Nov 6 12:35:54 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Nov 2019 12:35:54 +0000 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <93330192-7143-ca82-9872-fe627a97772e@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> <6c110878-a477-df8a-e566-84b113806044@redhat.com> <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> <2be52de0-6f12-d989-cf69-5807b2160cb0@redhat.com> <93330192-7143-ca82-9872-fe627a97772e@redhat.com> Message-ID: <7e9ace3d-8d15-e87a-f01c-90fc4b6faa6a@redhat.com> On 11/6/19 12:18 PM, Zhengyu Gu wrote: > Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.02/index.html OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Wed Nov 6 12:46:55 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 6 Nov 2019 13:46:55 +0100 Subject: [aarch64-port-dev ] RFR (XS) 8233695: AArch64 build failures after -Wno-extra removal In-Reply-To: <46aa9d40-7353-c5b6-6bd9-4152eee0a3b8@redhat.com> References: <46aa9d40-7353-c5b6-6bd9-4152eee0a3b8@redhat.com> Message-ID: <9f35ecb2-83e6-bf04-5f91-a671fd65070f@redhat.com> On 11/6/19 1:33 PM, Andrew Haley wrote: > On 11/6/19 12:18 PM, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8233695 >> >> Fix: >> https://cr.openjdk.java.net/~shade/8233695/webrev.01/ >> >> This is the simplest fix I can think of in orderAccess: cast away constness. Also, trivially unused >> parameter in templateInterpreterGenerator_aarch64.cpp. The issues is exposed in 14, but it is >> actually there in all releases down to 8-aarch64. > > It's better to use const_cast here. Otherwise OK, thanks. Right. Like this? https://cr.openjdk.java.net/~shade/8233695/webrev.02/ -- Thanks, -Aleksey From aph at redhat.com Wed Nov 6 13:14:15 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Nov 2019 13:14:15 +0000 Subject: [aarch64-port-dev ] RFR (XS) 8233695: AArch64 build failures after -Wno-extra removal In-Reply-To: <9f35ecb2-83e6-bf04-5f91-a671fd65070f@redhat.com> References: <46aa9d40-7353-c5b6-6bd9-4152eee0a3b8@redhat.com> <9f35ecb2-83e6-bf04-5f91-a671fd65070f@redhat.com> Message-ID: <360b522d-ff3d-16af-b833-27b518e39fc3@redhat.com> On 11/6/19 12:46 PM, Aleksey Shipilev wrote: > Right. Like this? > https://cr.openjdk.java.net/~shade/8233695/webrev.02/ Exactly. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Nov 6 13:43:31 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Nov 2019 13:43:31 +0000 Subject: [aarch64-port-dev ] RFD: scratch registers cleanup [Long post] In-Reply-To: <3e4aaf79-59a9-b346-e6b0-69839acf723e@redhat.com> References: <3e4aaf79-59a9-b346-e6b0-69839acf723e@redhat.com> Message-ID: <7264d4c2-1520-c49f-2e73-f1527fb90da3@redhat.com> One other thing: this exercise has shown that in many cases we trash scratch registers in places where it really doesn't matter, and we'd be much better off rewriting them not to do so. This makes push_call_clobbered_registers() something that can safely be used everywhere. But I'm holding off any of this because I want the first patch to be, if at all possible, neutral with regard to code generated. diff -r 33f9271b3167 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp Mon Nov 04 13:13:34 2019 -0500 +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp Wed Nov 06 08:36:08 2019 -0500 @@ -2624,15 +2624,17 @@ int step = 4 * wordSize; push(RegSet::range(r0, r18) - RegSet::of(rscratch1, rscratch2), sp); sub(sp, sp, step); - mov(rscratch1, -step); + mov(r0, -step); // Push v0-v7, v16-v31. for (int i = 31; i>= 4; i -= 4) { if (i <= v7->encoding() || i >= v16->encoding()) st1(as_FloatRegister(i-3), as_FloatRegister(i-2), as_FloatRegister(i-1), - as_FloatRegister(i), T1D, Address(post(sp, rscratch1))); + as_FloatRegister(i), T1D, Address(post(sp, r0))); } st1(as_FloatRegister(0), as_FloatRegister(1), as_FloatRegister(2), as_FloatRegister(3), T1D, Address(sp)); + // Reload r0 from where it was saved before pushing v0-v7, v16-v31. + ldr(r0, Address(sp, (8 + 16) * wordSize)); } -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Wed Nov 6 13:45:27 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 6 Nov 2019 14:45:27 +0100 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <93330192-7143-ca82-9872-fe627a97772e@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> <6c110878-a477-df8a-e566-84b113806044@redhat.com> <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> <2be52de0-6f12-d989-cf69-5807b2160cb0@redhat.com> <93330192-7143-ca82-9872-fe627a97772e@redhat.com> Message-ID: <7f8dd01f-30f7-f8c9-544a-c06f2a49eea0@redhat.com> On 11/6/19 1:18 PM, Zhengyu Gu wrote: > Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.02/index.html Minor nits: *) shenandoahBarrierSetAssembler_aarch64.cpp: excess space between parentheses: 368 if (!is_reference_type(type) ) { *) shenandoahBarrierSetC1.cpp: so, native oop loads used to call to ShenandoahRuntime::load_reference_barrier_native before this refactoring? That would mean it is enabled even when "passive" is enabled (which implies -ShenandoahLRB)? Current change looks fine, but we need to recognize this is the behavioral change. Please link the issue where that regression was introduced. Otherwise looks fine to me, let Roman ack it too. -- Thanks, -Aleksey From zgu at redhat.com Wed Nov 6 14:15:55 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 6 Nov 2019 09:15:55 -0500 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <7f8dd01f-30f7-f8c9-544a-c06f2a49eea0@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> <6c110878-a477-df8a-e566-84b113806044@redhat.com> <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> <2be52de0-6f12-d989-cf69-5807b2160cb0@redhat.com> <93330192-7143-ca82-9872-fe627a97772e@redhat.com> <7f8dd01f-30f7-f8c9-544a-c06f2a49eea0@redhat.com> Message-ID: <0251231d-047e-0117-25b0-8ecfc9b30b7f@redhat.com> Thanks for the review, Aleksey. On 11/6/19 8:45 AM, Aleksey Shipilev wrote: > On 11/6/19 1:18 PM, Zhengyu Gu wrote: >> Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.02/index.html > > Minor nits: > > *) shenandoahBarrierSetAssembler_aarch64.cpp: excess space between parentheses: > > 368 if (!is_reference_type(type) ) { Will fix before push. > > *) shenandoahBarrierSetC1.cpp: so, native oop loads used to call to > ShenandoahRuntime::load_reference_barrier_native before this refactoring? That would mean it is > enabled even when "passive" is enabled (which implies -ShenandoahLRB)? Current change looks fine, > but we need to recognize this is the behavioral change. Please link the issue where that regression > was introduced. Correct, we don't need load_reference_barrier_native barrier if weak roots are processed at STW pauses. Added comments about this behavioral change in CR and linked to JDK-8227635. -Zhengyu > > Otherwise looks fine to me, let Roman ack it too. > From rkennke at redhat.com Wed Nov 6 14:39:58 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 6 Nov 2019 15:39:58 +0100 Subject: [aarch64-port-dev ] RFR 8233401: Shenandoah: Refactor/cleanup Shenandoah load barrier code In-Reply-To: <7e9ace3d-8d15-e87a-f01c-90fc4b6faa6a@redhat.com> References: <45287c04-370c-cb0b-1603-c93fe15da3d9@redhat.com> <87b115fb-5353-b21b-3cbc-f862bd932b3e@redhat.com> <3d70db1c-c927-48f8-23ab-8937838e0302@redhat.com> <0d347d16-f870-798f-0165-1ee4dfae511b@redhat.com> <859e48d6-9af5-b4af-32ac-4b07ce92e94d@redhat.com> <6c110878-a477-df8a-e566-84b113806044@redhat.com> <84394d85-1b99-8139-3baf-7fbedba702c0@redhat.com> <2be52de0-6f12-d989-cf69-5807b2160cb0@redhat.com> <93330192-7143-ca82-9872-fe627a97772e@redhat.com> <7e9ace3d-8d15-e87a-f01c-90fc4b6faa6a@redhat.com> Message-ID: >> Updated: http://cr.openjdk.java.net/~zgu/JDK-8233401/webrev.02/index.html > > OK. Ok too. Roman From adinn at redhat.com Wed Nov 6 14:58:37 2019 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 6 Nov 2019 14:58:37 +0000 Subject: [aarch64-port-dev ] RFD: scratch registers cleanup [Long post] In-Reply-To: <7264d4c2-1520-c49f-2e73-f1527fb90da3@redhat.com> References: <3e4aaf79-59a9-b346-e6b0-69839acf723e@redhat.com> <7264d4c2-1520-c49f-2e73-f1527fb90da3@redhat.com> Message-ID: <13a7fdcf-5360-f6d0-02fb-b163f1196de1@redhat.com> On 06/11/2019 13:43, Andrew Haley wrote: > One other thing: this exercise has shown that in many cases we trash > scratch registers in places where it really doesn't matter, and we'd > be much better off rewriting them not to do so. Agreed. > This makes push_call_clobbered_registers() something that can safely > be used everywhere. But I'm holding off any of this because I want the > first patch to be, if at all possible, neutral with regard to code > generated. > . . . Yes, that patch is an improvement. regards, Andrew Dinn ----------- From aleksei.voitylov at bell-sw.com Wed Nov 6 16:53:02 2019 From: aleksei.voitylov at bell-sw.com (Aleksei Voitylov) Date: Wed, 6 Nov 2019 19:53:02 +0300 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: References: Message-ID: <9248d1b3-7341-68b9-7dd1-02dfd48567a6@bell-sw.com> Hi Patrick, I like the fact that this patch does not add much to the complexity of the code. Here are some experiments that you could find useful. Cortex A73??? Size??? base (ns/op)??? patched (ns/op)??? Diff? StringCompareBench.StringCompareLL??? 256??? 14422257,98??? 15302300,24??? -6,10% StringCompareBench.StringCompareLL??? 512??? 27998036,21??? 28317818,08??? -1,14% ThunderX2 ?? Size??? base (ns/op)??? patched (ns/op)??? Diff StringCompareBench.StringCompareLL??? 128??? 4265122,232??? 13099099,67??? -207,12% StringCompareBench.StringCompareLL??? 256??? 3539452,533??? 3599407,432??? -1,69% StringCompareBench.StringCompareUU??? 128??? 6899938,75??? 7174601,241??? -3,98% StringCompareBench.StringCompareUU??? 256??? 7654538,841??? 7826599,466??? -2,25% StringCompareBench.cachedStringCompareLL??? 128??? 19,673??? 21,242??? -7,98% StringCompareBench.cachedStringCompareLL??? 256??? 34,179??? 36,452??? -6,65% StringCompareBench.cachedStringCompareLL??? 512??? 59,574??? 64,088??? -7,58% StringCompareBench.cachedStringCompareLL??? 1024??? 110,37??? 118,477??? -7,35% StringCompareBench.cachedStringCompareLL??? 1000000??? 114028,907??? 115388,681??? -1,19% StringCompareBench.cachedStringCompareUU??? 128??? 33,752??? 36,922??? -9,39% StringCompareBench.cachedStringCompareUU??? 256??? 60,939??? 64,096??? -5,18% StringCompareBench.cachedStringCompareUU??? 512??? 115,328??? 118,48??? -2,73% StringCompareBench.cachedStringCompareUU??? 1024??? 239,332??? 242,97??? -1,52% StringCompareBench.cachedStringCompareUU??? 1000000??? 226491,096??? 233638,328??? -3,16% It might be the case that the newly added branch is the culprit: +????? __ subs(rscratch2, cnt2, largeLoopExitCondition); +????? __ br(__ LT, NO_PREFETCH); Maybe you could skip it when CompareLongStringLimitLatin and CompareLongStringLimitUTF are large enough (then stub code is only called with string length large enough to skip branch above). Then (the properly commented) code would look like: if ((stub_threshold-wordSize/(isLL ? 1 : 2)) < largeLoopExitCondition) { ???? __ subs(rscratch2, cnt2, largeLoopExitCondition); ???? __ br(__ LT, NO_PREFETCH); } and in this case we shouldn't see any performance penalties. -Aleksei On 29/10/2019 12:58, Patrick Zhang OS wrote: > Hi, > > Could you please review this patch, thanks. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 > Webrev: http://cr.openjdk.java.net/~qpzhang/8229351/webrev.02 > (this starts from .02 since there had been some internal review and updates) > > Changes: > > 1. Split the STUB_THRESHOLD from the hard-coded 72 to be CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more flexible control over the stub thresholds for string_compare intrinsics, especially for various uArchs. > > 2. MacroAssembler::string_compare LL and UU shared the same threshold, actually UU may only require the half (length of chars) of that of LL's, because one character has two-bytes for UU, while for compacted LL strings, one character means one byte. In addition, LU/UL may need a separated threshold, as the stub function is different from the same encoding one, and the performance may vary as well. > > 3. In generate_compare_long_string_same_encoding, the hard-coded 72 was originally able to ensure that there can be always 64 bytes at least for the prefetch code path. However once a smaller stub threshold is set, a new condition is needed to tell if this would be still valid, or has to go to the NO_PREFETCH branch. This change can ensure the correctness. > > 4. In generate_compare_long_string_different_encoding, some temp vars for handling the last 4 characters are not valid any longer, cleaned up strU and strL, and related pointers initialization to the next U (cnt1) and L (tmp2). > > 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not needed, as tmpU or tmpL point to the same register. > > Tests: > > 1. For function check, I have run > > jdk jtreg tier1 tests, with default vm flags > > hotspot jtreg tests: runtime/compiler/gc parts, with "-Xcomp -XX:-TieredCompilation" > > jck10/api/java.lang 1609 cases and other selected modules, no new failures found, with default vm flags and "-Xcomp -XX:-TieredCompilation" respectively; > > some specific test cases had been carefully executed to double check, i.e., TestStringCompareToDifferentLength.java [1] and TestStringCompareToDifferentLength.java [1] introduced by [2], StrCmpTest.java [3] introduced by [4]. > > 1. For performance check, I have run > > string-density-bench/CompareToBench.java [5] and StringCompareBench.java [6] respectively, > > and SPECjbb2015.jar, no obvious performance change has been found (since the default threshold is NOT changed within this patch). > > FYI. with Ampere eMAG system, microbenchmarks [5][6] can have 1.5x consistent perf gain with LU/UL comparison for shorter strings (<72 chars, smaller stub thresholds), and slight improvement (5~10%) with LL/UU cases. > > Refs: > [1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string > [2] https://bugs.openjdk.java.net/browse/JDK-8218966 AArch64: String.compareTo() can read memory after string > [3] http://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java, contributed by Dmitrij Pochepko > [4] https://bugs.openjdk.java.net/browse/JDK-8202326 AARCH64: optimize string compare intrinsic > [5] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar, contributed by Aleksey Shipilev > [6] http://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java, contributed by Dmitrij Pochepko > > Regards > Patrick > From felix.yang at huawei.com Thu Nov 7 01:17:05 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 7 Nov 2019 01:17:05 +0000 Subject: [aarch64-port-dev ] RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when profiling return and parameters type Message-ID: Hi, Please review the following patch: Bug: https://bugs.openjdk.java.net/browse/JDK-8233466 Webrev: http://cr.openjdk.java.net/~fyang/8233466/webrev.00/ When profiling return and parameters type from the interpreter on aarch64 platform, 'mdp' is loaded by test_method_data_pointer which is called by profile_return_type & profile_parameters_type. It's not necessary to load mdo before calling __ profile_return_type or __ profile_parameters_type. Passed tier1-3 testing. Thanks, Felix From ci_notify at linaro.org Thu Nov 7 01:27:13 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Thu, 7 Nov 2019 01:27:13 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <1809406039.12368.1573090034194.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/310/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/20 pass: 5,728 Build 1: aarch64/2019/sep/23 pass: 5,727 Build 2: aarch64/2019/oct/07 pass: 5,750 Build 3: aarch64/2019/oct/09 pass: 5,747; fail: 1 Build 4: aarch64/2019/oct/11 pass: 5,751; fail: 1 Build 5: aarch64/2019/oct/14 pass: 5,753 Build 6: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 7: aarch64/2019/oct/18 pass: 5,760 Build 8: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 9: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 10: aarch64/2019/oct/28 pass: 5,766 Build 11: aarch64/2019/oct/30 pass: 5,768 Build 12: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 13: aarch64/2019/nov/04 pass: 5,769 Build 14: aarch64/2019/nov/06 pass: 5,766; fail: 2 1 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/20 pass: 8,685; fail: 503; error: 22 Build 1: aarch64/2019/sep/23 pass: 8,696; fail: 497; error: 19 Build 2: aarch64/2019/oct/07 pass: 8,683; fail: 517; error: 18 Build 3: aarch64/2019/oct/09 pass: 8,692; fail: 507; error: 21 Build 4: aarch64/2019/oct/11 pass: 8,693; fail: 511; error: 18 Build 5: aarch64/2019/oct/14 pass: 8,706; fail: 497; error: 20 Build 6: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 7: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 8: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 9: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 10: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 11: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 12: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 13: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 Build 14: aarch64/2019/nov/06 pass: 8,775; fail: 507; error: 19 7 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/sep/20 pass: 3,979 Build 1: aarch64/2019/sep/23 pass: 3,979 Build 2: aarch64/2019/oct/07 pass: 3,979 Build 3: aarch64/2019/oct/09 pass: 3,979 Build 4: aarch64/2019/oct/11 pass: 3,979 Build 5: aarch64/2019/oct/14 pass: 3,979 Build 6: aarch64/2019/oct/16 pass: 3,979 Build 7: aarch64/2019/oct/18 pass: 3,979 Build 8: aarch64/2019/oct/21 pass: 3,979 Build 9: aarch64/2019/oct/23 pass: 3,980 Build 10: aarch64/2019/oct/28 pass: 3,980 Build 11: aarch64/2019/oct/30 pass: 3,980 Build 12: aarch64/2019/nov/01 pass: 3,980 Build 13: aarch64/2019/nov/04 pass: 3,980 Build 14: aarch64/2019/nov/06 pass: 3,980 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.84x Relative performance: Server critical-jOPS (nc): 9.75x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 207.57 Server 207.57 / Server 2014-04-01 (71.00): 2.92x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-09-21 pass rate: 10487/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/263/results/ 2019-09-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/266/results/ 2019-10-08 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/280/results/ 2019-10-10 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/282/results/ 2019-10-12 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/284/results/ 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ 2019-11-07 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/310/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From felix.yang at huawei.com Thu Nov 7 01:27:11 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 7 Nov 2019 01:27:11 +0000 Subject: [aarch64-port-dev ] RFR (trivial) : fix aarch64-8u type profile bug In-Reply-To: References: <222f9c0b-7320-8d22-cd44-c4f3af7c1311@redhat.com> <880f5072-91ba-66bd-94be-429556e7c132@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Wednesday, November 6, 2019 5:46 PM > To: Yangfei (Felix) ; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR (trivial) : fix aarch64-8u type profile bug > > On 9/24/19 9:22 AM, Yangfei (Felix) wrote: > >>> This also reminds me of another two aarch64-specific profiling issues: > >>> https://bugs.openjdk.java.net/browse/JDK-8188221 > >>> https://bugs.openjdk.java.net/browse/JDK-8189439 > >>> > >>> I think they also should be incorporated in aarch64 8u. What do you > think? > >> I've always been reluctant to backport performance-only patches to > >> 8u, but I admit that version will be around for a long time, so OK. > >> > > Looks like the upstream patches can be simplified: 'mdp' is loaded by > test_method_data_pointer which is called by profile_return_type & > profile_parameters_type. > > Sorry for not replying before now. > > Maybe they can, but this is a backport. Hi, OK. I have sent a separate mail to fix this for jdk14: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036877.html Please review that one. I want to fix this for jdk14 before I do the aarch64 8u backport. Thanks, Felix From patrick at os.amperecomputing.com Thu Nov 7 10:55:40 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Thu, 7 Nov 2019 10:55:40 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: <9248d1b3-7341-68b9-7dd1-02dfd48567a6@bell-sw.com> References: <9248d1b3-7341-68b9-7dd1-02dfd48567a6@bell-sw.com> Message-ID: Hi Aleksei, Thanks for testing it and the data. I only had the source of StringCompareBench.java [6], my numbers (the diffs) are within 2%, while the -207.12% looks quite weird. I initially did not add the condition to control the br, since generate_ compare_long_string_different_encoding has the similar unconditional br. By the way, the original logic allowed prefetching the memory behind array border, for the first 64 bytes. I think securing the prefetch is the right thing to do, but it could certainly stop some cases from going to the large loop with prefetching. Welcome further comments, thanks. http://cr.openjdk.java.net/~qpzhang/8229351/webrev.03/ Regards Patrick From: Aleksei Voitylov Sent: Thursday, November 7, 2019 12:53 AM To: Patrick Zhang OS Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable Hi Patrick, I like the fact that this patch does not add much to the complexity of the code. Here are some experiments that you could find useful. Cortex A73 Size base (ns/op) patched (ns/op) Diff StringCompareBench.StringCompareLL 256 14422257,98 15302300,24 -6,10% StringCompareBench.StringCompareLL 512 27998036,21 28317818,08 -1,14% ThunderX2 Size base (ns/op) patched (ns/op) Diff StringCompareBench.StringCompareLL 128 4265122,232 13099099,67 -207,12% StringCompareBench.StringCompareLL 256 3539452,533 3599407,432 -1,69% StringCompareBench.StringCompareUU 128 6899938,75 7174601,241 -3,98% StringCompareBench.StringCompareUU 256 7654538,841 7826599,466 -2,25% StringCompareBench.cachedStringCompareLL 128 19,673 21,242 -7,98% StringCompareBench.cachedStringCompareLL 256 34,179 36,452 -6,65% StringCompareBench.cachedStringCompareLL 512 59,574 64,088 -7,58% StringCompareBench.cachedStringCompareLL 1024 110,37 118,477 -7,35% StringCompareBench.cachedStringCompareLL 1000000 114028,907 115388,681 -1,19% StringCompareBench.cachedStringCompareUU 128 33,752 36,922 -9,39% StringCompareBench.cachedStringCompareUU 256 60,939 64,096 -5,18% StringCompareBench.cachedStringCompareUU 512 115,328 118,48 -2,73% StringCompareBench.cachedStringCompareUU 1024 239,332 242,97 -1,52% StringCompareBench.cachedStringCompareUU 1000000 226491,096 233638,328 -3,16% It might be the case that the newly added branch is the culprit: + __ subs(rscratch2, cnt2, largeLoopExitCondition); + __ br(__ LT, NO_PREFETCH); Maybe you could skip it when CompareLongStringLimitLatin and CompareLongStringLimitUTF are large enough (then stub code is only called with string length large enough to skip branch above). Then (the properly commented) code would look like: if ((stub_threshold-wordSize/(isLL ? 1 : 2)) < largeLoopExitCondition) { __ subs(rscratch2, cnt2, largeLoopExitCondition); __ br(__ LT, NO_PREFETCH); } and in this case we shouldn't see any performance penalties. -Aleksei On 29/10/2019 12:58, Patrick Zhang OS wrote: Hi, Could you please review this patch, thanks. JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 Webrev: http://cr.openjdk.java.net/~qpzhang/8229351/webrev.02 (this starts from .02 since there had been some internal review and updates) Changes: 1. Split the STUB_THRESHOLD from the hard-coded 72 to be CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more flexible control over the stub thresholds for string_compare intrinsics, especially for various uArchs. 2. MacroAssembler::string_compare LL and UU shared the same threshold, actually UU may only require the half (length of chars) of that of LL's, because one character has two-bytes for UU, while for compacted LL strings, one character means one byte. In addition, LU/UL may need a separated threshold, as the stub function is different from the same encoding one, and the performance may vary as well. 3. In generate_compare_long_string_same_encoding, the hard-coded 72 was originally able to ensure that there can be always 64 bytes at least for the prefetch code path. However once a smaller stub threshold is set, a new condition is needed to tell if this would be still valid, or has to go to the NO_PREFETCH branch. This change can ensure the correctness. 4. In generate_compare_long_string_different_encoding, some temp vars for handling the last 4 characters are not valid any longer, cleaned up strU and strL, and related pointers initialization to the next U (cnt1) and L (tmp2). 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not needed, as tmpU or tmpL point to the same register. Tests: 1. For function check, I have run jdk jtreg tier1 tests, with default vm flags hotspot jtreg tests: runtime/compiler/gc parts, with "-Xcomp -XX:-TieredCompilation" jck10/api/java.lang 1609 cases and other selected modules, no new failures found, with default vm flags and "-Xcomp -XX:-TieredCompilation" respectively; some specific test cases had been carefully executed to double check, i.e., TestStringCompareToDifferentLength.java [1] and TestStringCompareToDifferentLength.java [1] introduced by [2], StrCmpTest.java [3] introduced by [4]. 1. For performance check, I have run string-density-bench/CompareToBench.java [5] and StringCompareBench.java [6] respectively, and SPECjbb2015.jar, no obvious performance change has been found (since the default threshold is NOT changed within this patch). FYI. with Ampere eMAG system, microbenchmarks [5][6] can have 1.5x consistent perf gain with LU/UL comparison for shorter strings (<72 chars, smaller stub thresholds), and slight improvement (5~10%) with LL/UU cases. Refs: [1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string [2] https://bugs.openjdk.java.net/browse/JDK-8218966 AArch64: String.compareTo() can read memory after string [3] http://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java, contributed by Dmitrij Pochepko [4] https://bugs.openjdk.java.net/browse/JDK-8202326 AARCH64: optimize string compare intrinsic [5] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar, contributed by Aleksey Shipilev [6] http://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java, contributed by Dmitrij Pochepko Regards Patrick From zgu at redhat.com Thu Nov 7 14:55:13 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 7 Nov 2019 09:55:13 -0500 Subject: [aarch64-port-dev ] RFR(XS) 8233337: Shenandoah: Cleanup AArch64 SBSA::load_reference_barrier_not_null() Message-ID: <5de422c4-ea76-81b0-8413-d3e81f60a09d@redhat.com> Please review this cleanup patch suggested by Andrew Haley. Please see [1] for details Bug: https://bugs.openjdk.java.net/browse/JDK-8233337 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233337/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) on AArch64 Linux Thanks, -Zhengyu [1] https://mail.openjdk.java.net/pipermail/shenandoah-dev/2019-October/010976.html From rkennke at redhat.com Thu Nov 7 15:37:08 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 7 Nov 2019 16:37:08 +0100 Subject: [aarch64-port-dev ] RFR(XS) 8233337: Shenandoah: Cleanup AArch64 SBSA::load_reference_barrier_not_null() In-Reply-To: <5de422c4-ea76-81b0-8413-d3e81f60a09d@redhat.com> References: <5de422c4-ea76-81b0-8413-d3e81f60a09d@redhat.com> Message-ID: Looks good,thanks! Roman > Please review this cleanup patch suggested by Andrew Haley. Please see > [1] for details > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233337 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233337/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > ? on AArch64 Linux > > Thanks, > > -Zhengyu > > > > > [1] > https://mail.openjdk.java.net/pipermail/shenandoah-dev/2019-October/010976.html > > From zgu at redhat.com Thu Nov 7 19:01:42 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 7 Nov 2019 14:01:42 -0500 Subject: [aarch64-port-dev ] RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet In-Reply-To: References: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> <0fb9cd70-0a89-8c14-7469-55205c4c3808@redhat.com> Message-ID: <9f4e51fd-dd2c-1f74-e695-51923c75a52a@redhat.com> > > Filed: https://bugs.openjdk.java.net/browse/JDK-8233401 Rebased on top of JDK-8233401 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233339/webrev.02/index.html Thanks, -Zhengyu > > Matter of fact, I would like to hold off this code review, till reactor > is done. > > Thanks, > > -Zhengyu > >> >> *) shenandoahBarrierSetAssembler_x86.cpp, I believe it would be more >> straightforward to save >> branching on local variable "need_load_reference_barrier" by spelling >> out the "disabled" path >> directly (in fact, I think you are almost there in >> shenandoahBarrierSetC1.cpp!): >> >> ?? if (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, >> type)) { >> ???? BarrierSetAssembler::load_at(masm, decorators, type, dst, src, >> tmp1, tmp_thread); >> ???? return; >> ?? } >> >> ?? ... code that assumes need_load_reference_barrier = true follows ... >> >> ?? Register result_dst = dst; >> ?? bool use_tmp1_for_dst = false; >> >> *) shenandoahBarrierSetC1.cpp: local variable >> "need_load_reference_barrier" is not needed, there is >> only a single use >> >> *) shenandoahBarrierSetC2.cpp: this block should go all the way up: >> >> ? 557?? if >> (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, type)) { >> ? 558???? return load; >> ? 559?? } >> >> *) shenandoahBarrierSet.cpp: this is just "return >> is_reference_type(type)". Saves some inversions. >> >> ?? 78?? if (!is_reference_type(type)) return false; >> ?? 79?? return true; >> >> *) shenandoahBarrierSet.cpp: should be "Should be subset of LRB": >> >> ?? 83?? assert(need_load_reference_barrier(decorators, type), "Why >> ask?"); >> >> *) shenandoahBarrierSet.cpp: seems like this assert is subsumed by the >> previous one? >> >> ??? 84?? assert(is_reference_type(type), "Why we here?"); >> >> From rkennke at redhat.com Thu Nov 7 19:41:00 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 7 Nov 2019 20:41:00 +0100 Subject: [aarch64-port-dev ] RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet In-Reply-To: <9f4e51fd-dd2c-1f74-e695-51923c75a52a@redhat.com> References: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> <0fb9cd70-0a89-8c14-7469-55205c4c3808@redhat.com> <9f4e51fd-dd2c-1f74-e695-51923c75a52a@redhat.com> Message-ID: <4b24e6ac-2109-4a7d-83aa-c2427343e22b@redhat.com> That looks good to me. Thanks, Roman >> >> Filed: https://bugs.openjdk.java.net/browse/JDK-8233401 > > Rebased on top of JDK-8233401 > > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233339/webrev.02/index.html > > Thanks, > > -Zhengyu > > >> >> Matter of fact, I would like to hold off this code review, till >> reactor is done. >> >> Thanks, >> >> -Zhengyu >> >>> >>> *) shenandoahBarrierSetAssembler_x86.cpp, I believe it would be more >>> straightforward to save >>> branching on local variable "need_load_reference_barrier" by spelling >>> out the "disabled" path >>> directly (in fact, I think you are almost there in >>> shenandoahBarrierSetC1.cpp!): >>> >>> ?? if (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, >>> type)) { >>> ???? BarrierSetAssembler::load_at(masm, decorators, type, dst, src, >>> tmp1, tmp_thread); >>> ???? return; >>> ?? } >>> >>> ?? ... code that assumes need_load_reference_barrier = true follows ... >>> >>> ?? Register result_dst = dst; >>> ?? bool use_tmp1_for_dst = false; >>> >>> *) shenandoahBarrierSetC1.cpp: local variable >>> "need_load_reference_barrier" is not needed, there is >>> only a single use >>> >>> *) shenandoahBarrierSetC2.cpp: this block should go all the way up: >>> >>> ? 557?? if >>> (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, type)) { >>> ? 558???? return load; >>> ? 559?? } >>> >>> *) shenandoahBarrierSet.cpp: this is just "return >>> is_reference_type(type)". Saves some inversions. >>> >>> ?? 78?? if (!is_reference_type(type)) return false; >>> ?? 79?? return true; >>> >>> *) shenandoahBarrierSet.cpp: should be "Should be subset of LRB": >>> >>> ?? 83?? assert(need_load_reference_barrier(decorators, type), "Why >>> ask?"); >>> >>> *) shenandoahBarrierSet.cpp: seems like this assert is subsumed by >>> the previous one? >>> >>> ??? 84?? assert(is_reference_type(type), "Why we here?"); >>> >>> From zgu at redhat.com Thu Nov 7 19:42:27 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 7 Nov 2019 14:42:27 -0500 Subject: [aarch64-port-dev ] RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet In-Reply-To: <4b24e6ac-2109-4a7d-83aa-c2427343e22b@redhat.com> References: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> <4ed90469-8689-b49d-69f1-98f644e9edd0@redhat.com> <0fb9cd70-0a89-8c14-7469-55205c4c3808@redhat.com> <9f4e51fd-dd2c-1f74-e695-51923c75a52a@redhat.com> <4b24e6ac-2109-4a7d-83aa-c2427343e22b@redhat.com> Message-ID: <31415213-3464-619a-0741-ca14f7b9cbcf@redhat.com> Thanks for the review, Roman -Zhengyu On 11/7/19 2:41 PM, Roman Kennke wrote: > That looks good to me. > > Thanks, > Roman > >>> >>> Filed: https://bugs.openjdk.java.net/browse/JDK-8233401 >> >> Rebased on top of JDK-8233401 >> >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233339/webrev.02/index.html >> >> Thanks, >> >> -Zhengyu >> >> >>> >>> Matter of fact, I would like to hold off this code review, till >>> reactor is done. >>> >>> Thanks, >>> >>> -Zhengyu >>> >>>> >>>> *) shenandoahBarrierSetAssembler_x86.cpp, I believe it would be more >>>> straightforward to save >>>> branching on local variable "need_load_reference_barrier" by spelling >>>> out the "disabled" path >>>> directly (in fact, I think you are almost there in >>>> shenandoahBarrierSetC1.cpp!): >>>> >>>> ?? if (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, >>>> type)) { >>>> ???? BarrierSetAssembler::load_at(masm, decorators, type, dst, src, >>>> tmp1, tmp_thread); >>>> ???? return; >>>> ?? } >>>> >>>> ?? ... code that assumes need_load_reference_barrier = true follows ... >>>> >>>> ?? Register result_dst = dst; >>>> ?? bool use_tmp1_for_dst = false; >>>> >>>> *) shenandoahBarrierSetC1.cpp: local variable >>>> "need_load_reference_barrier" is not needed, there is >>>> only a single use >>>> >>>> *) shenandoahBarrierSetC2.cpp: this block should go all the way up: >>>> >>>> ? 557?? if >>>> (!ShenandoahBarrierSet::need_load_reference_barrier(decorators, type)) { >>>> ? 558???? return load; >>>> ? 559?? } >>>> >>>> *) shenandoahBarrierSet.cpp: this is just "return >>>> is_reference_type(type)". Saves some inversions. >>>> >>>> ?? 78?? if (!is_reference_type(type)) return false; >>>> ?? 79?? return true; >>>> >>>> *) shenandoahBarrierSet.cpp: should be "Should be subset of LRB": >>>> >>>> ?? 83?? assert(need_load_reference_barrier(decorators, type), "Why >>>> ask?"); >>>> >>>> *) shenandoahBarrierSet.cpp: seems like this assert is subsumed by >>>> the previous one? >>>> >>>> ??? 84?? assert(is_reference_type(type), "Why we here?"); >>>> >>>> > From felix.yang at huawei.com Fri Nov 8 08:30:00 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 8 Nov 2019 08:30:00 +0000 Subject: [aarch64-port-dev ] 8233839: aarch64: missing memory barrier in NewObjectArrayStub and NewTypeArrayStub Message-ID: Hi, I witnessed random fail of one jcstress test on my 128-core aarch64 server: "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest" Bug: https://bugs.openjdk.java.net/browse/JDK-8233839 I used the latest aarch64 jdk8u release build. Please refer to the bugzilla for details and the analysis. I checked the assembler code emitted by LIR_Assembler::emit_alloc_array: For the fast path, the StoreStore memory barrier is there. But it?s not the case for the slow path. Patch adding the missing barrier for 14: diff -r ad157fab6bf5 src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp --- a/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp Thu Nov 07 16:26:57 2019 -0800 +++ b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp Fri Nov 08 16:10:08 2019 +0800 @@ -840,6 +840,7 @@ __ sub(arr_size, arr_size, t1); // body length __ add(t1, t1, obj); // body start __ initialize_body(t1, arr_size, 0, t2); + __ membar(Assembler::StoreStore); __ verify_oop(obj); __ ret(lr); JDK builds OK and passed tier1 test. Thanks, Felix From adinn at redhat.com Fri Nov 8 09:04:08 2019 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 8 Nov 2019 09:04:08 +0000 Subject: [aarch64-port-dev ] 8233839: aarch64: missing memory barrier in NewObjectArrayStub and NewTypeArrayStub In-Reply-To: References: Message-ID: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com> On 08/11/2019 08:30, Yangfei (Felix) wrote: > I witnessed random fail of one jcstress test on my 128-core aarch64 server: "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest" > Bug: https://bugs.openjdk.java.net/browse/JDK-8233839 > > I used the latest aarch64 jdk8u release build. Please refer to the bugzilla for details and the analysis. > I checked the assembler code emitted by LIR_Assembler::emit_alloc_array: > For the fast path, the StoreStore memory barrier is there. But it?s not the case for the slow path. > > Patch adding the missing barrier for 14: > > diff -r ad157fab6bf5 src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp Thu Nov 07 16:26:57 2019 -0800 > +++ b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp Fri Nov 08 16:10:08 2019 +0800 > @@ -840,6 +840,7 @@ > __ sub(arr_size, arr_size, t1); // body length > __ add(t1, t1, obj); // body start > __ initialize_body(t1, arr_size, 0, t2); > + __ membar(Assembler::StoreStore); > __ verify_oop(obj); > > __ ret(lr); > > JDK builds OK and passed tier1 test. Very nice detective work finding that one! The jdk14 patch looks good. Also the same patch for jdk11 and the variant for jdk8 are good. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From ci_notify at linaro.org Sun Nov 10 02:42:42 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Sun, 10 Nov 2019 02:42:42 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 11u on AArch64 Message-ID: <961463389.104.1573353762559.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/summary/2019/313/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jun/27 pass: 5,737; fail: 5 Build 1: aarch64/2019/jul/02 pass: 5,737; fail: 5 Build 2: aarch64/2019/aug/03 pass: 5,746; fail: 4 Build 3: aarch64/2019/aug/10 pass: 5,747; fail: 4 Build 4: aarch64/2019/aug/15 pass: 5,753; fail: 4 Build 5: aarch64/2019/aug/22 pass: 5,755; fail: 4 Build 6: aarch64/2019/sep/04 pass: 5,764; fail: 2 Build 7: aarch64/2019/sep/05 pass: 5,764; fail: 2 Build 8: aarch64/2019/sep/10 pass: 5,764; fail: 2 Build 9: aarch64/2019/sep/17 pass: 5,763; fail: 3 Build 10: aarch64/2019/sep/21 pass: 5,764; fail: 2 Build 11: aarch64/2019/oct/04 pass: 5,764; fail: 2 Build 12: aarch64/2019/oct/17 pass: 5,764; fail: 2 Build 13: aarch64/2019/oct/31 pass: 5,784; fail: 1 Build 14: aarch64/2019/nov/09 pass: 5,773; fail: 3 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jun/27 pass: 8,401; fail: 512; error: 22 Build 1: aarch64/2019/jul/02 pass: 8,407; fail: 498; error: 31 Build 2: aarch64/2019/aug/03 pass: 8,429; fail: 509; error: 18 Build 3: aarch64/2019/aug/10 pass: 8,450; fail: 485; error: 16 Build 4: aarch64/2019/aug/15 pass: 8,443; fail: 496; error: 13 Build 5: aarch64/2019/aug/22 pass: 8,446; fail: 494; error: 15 Build 6: aarch64/2019/sep/04 pass: 8,483; fail: 465; error: 10 Build 7: aarch64/2019/sep/05 pass: 8,465; fail: 479; error: 14 Build 8: aarch64/2019/sep/10 pass: 8,444; fail: 500; error: 14 Build 9: aarch64/2019/sep/17 pass: 8,462; fail: 482; error: 12 Build 10: aarch64/2019/sep/21 pass: 8,467; fail: 478; error: 13 Build 11: aarch64/2019/oct/04 pass: 8,444; fail: 498; error: 16 Build 12: aarch64/2019/oct/17 pass: 8,452; fail: 493; error: 16 Build 13: aarch64/2019/oct/31 pass: 8,468; fail: 490; error: 14 Build 14: aarch64/2019/nov/09 pass: 8,487; fail: 470; error: 16 2 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jun/27 pass: 3,908 Build 1: aarch64/2019/jul/02 pass: 3,908 Build 2: aarch64/2019/aug/03 pass: 3,908 Build 3: aarch64/2019/aug/10 pass: 3,909 Build 4: aarch64/2019/aug/15 pass: 3,909 Build 5: aarch64/2019/aug/22 pass: 3,909 Build 6: aarch64/2019/sep/04 pass: 3,910 Build 7: aarch64/2019/sep/05 pass: 3,910 Build 8: aarch64/2019/sep/10 pass: 3,910 Build 9: aarch64/2019/sep/17 pass: 3,910 Build 10: aarch64/2019/sep/21 pass: 3,910 Build 11: aarch64/2019/oct/04 pass: 3,910 Build 12: aarch64/2019/oct/17 pass: 3,910 Build 13: aarch64/2019/oct/31 pass: 3,910 Build 14: aarch64/2019/nov/09 pass: 3,910 Previous results can be found here: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.71x Relative performance: Server critical-jOPS (nc): 7.99x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 204.57 Server 204.57 / Server 2014-04-01 (71.00): 2.88x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-06-28 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/178/results/ 2019-07-03 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/183/results/ 2019-08-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/215/results/ 2019-08-11 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/222/results/ 2019-08-16 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/227/results/ 2019-08-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/234/results/ 2019-09-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/247/results/ 2019-09-07 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/248/results/ 2019-09-11 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/253/results/ 2019-09-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/260/results/ 2019-09-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/264/results/ 2019-10-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/277/results/ 2019-10-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/290/results/ 2019-11-01 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/304/results/ 2019-11-10 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/313/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/ From felix.yang at huawei.com Mon Nov 11 01:41:22 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 11 Nov 2019 01:41:22 +0000 Subject: [aarch64-port-dev ] 8233839: aarch64: missing memory barrier in NewObjectArrayStub and NewTypeArrayStub In-Reply-To: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com> References: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Dinn [mailto:adinn at redhat.com] > Sent: Friday, November 8, 2019 5:04 PM > To: Yangfei (Felix) ; > hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: Re: 8233839: aarch64: missing memory barrier in NewObjectArrayStub > and NewTypeArrayStub > > On 08/11/2019 08:30, Yangfei (Felix) wrote: > > I witnessed random fail of one jcstress test on my 128-core aarch64 server: > "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest" > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233839 > > > > I used the latest aarch64 jdk8u release build. Please refer to the bugzilla for > details and the analysis. > > I checked the assembler code emitted by > LIR_Assembler::emit_alloc_array: > > For the fast path, the StoreStore memory barrier is there. But it?s not the > case for the slow path. > > > > Patch adding the missing barrier for 14: > > > > diff -r ad157fab6bf5 src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp > > --- a/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp Thu Nov 07 > 16:26:57 2019 -0800 > > +++ b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp Fri Nov 08 > 16:10:08 2019 +0800 > > @@ -840,6 +840,7 @@ > > __ sub(arr_size, arr_size, t1); // body length > > __ add(t1, t1, obj); // body start > > __ initialize_body(t1, arr_size, 0, t2); > > + __ membar(Assembler::StoreStore); > > __ verify_oop(obj); > > > > __ ret(lr); > > > > JDK builds OK and passed tier1 test. > Very nice detective work finding that one! > > The jdk14 patch looks good. Also the same patch for jdk11 and the variant for > jdk8 are good. > Thanks for reviewing this. The jdk14 patch has been pushed as: https://hg.openjdk.java.net/jdk/jdk/rev/90cf1d4e712f Will push to aarch64 jdk8u after the jdk11u-fix-request is approved. Felix From aph at redhat.com Mon Nov 11 10:07:10 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Nov 2019 10:07:10 +0000 Subject: [aarch64-port-dev ] 8233839: aarch64: missing memory barrier in NewObjectArrayStub and NewTypeArrayStub In-Reply-To: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com> References: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com> Message-ID: <9270b589-736e-fcce-064b-dcc6b6570406@redhat.com> On 11/8/19 9:04 AM, Andrew Dinn wrote: > On 08/11/2019 08:30, Yangfei (Felix) wrote: >> I witnessed random fail of one jcstress test on my 128-core aarch64 server: "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest" >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233839 >> JDK builds OK and passed tier1 test. > Very nice detective work finding that one! > > The jdk14 patch looks good. Also the same patch for jdk11 and the > variant for jdk8 are good. Looks like ARM32 does not have the same bug. PowerPC doesn't even attempt a fast path in this case. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Mon Nov 11 11:04:25 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 11 Nov 2019 11:04:25 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: Message-ID: <421c54f0-43c0-a704-03c2-0d13c5dbeade@redhat.com> Hi Felix, On 05/11/2019 06:20, Yangfei (Felix) wrote: > Please review this small improvements of aarch64 atomic operations. > This eliminates the use of full memory barriers. > Passed tier1-3 testing. The patch looks ok to me. regards, Andrew Dinn ----------- From aph at redhat.com Mon Nov 11 11:17:01 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Nov 2019 11:17:01 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: Message-ID: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> On 11/5/19 6:20 AM, Yangfei (Felix) wrote: > Please review this small improvements of aarch64 atomic operations. > This eliminates the use of full memory barriers. > Passed tier1-3 testing. No, rejected. Patch also must go to hotspot-dev. Are you sure this is safe? The HotSpot internal barriers are specified as being full two-way barriers, which these are not. Tier1 testing really isn't going to do it. Now, you might argue that none of the uses in HotSpot actually require anything stronger that acq/rel, but good luck proving that. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Mon Nov 11 12:01:24 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 11 Nov 2019 12:01:24 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Monday, November 11, 2019 7:17 PM > To: Yangfei (Felix) ; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > On 11/5/19 6:20 AM, Yangfei (Felix) wrote: > > Please review this small improvements of aarch64 atomic operations. > > This eliminates the use of full memory barriers. > > Passed tier1-3 testing. > > No, rejected. > > Patch also must go to hotspot-dev. CCing to hotspot-dev. > Are you sure this is safe? The HotSpot internal barriers are specified as being > full two-way barriers, which these are not. Tier1 testing really isn't going to do > it. Now, you might argue that none of the uses in HotSpot actually require > anything stronger that acq/rel, but good luck proving that. I was also curious about the reason why full memory barrier is used here. For add_and_fetch, I was thinking that there is no difference in functionality for the following two code snippet. It's interesting to know that this may make a difference. Can you elaborate more on that please? 1) without patch .L2: ldxr x2, [x1] add x2, x2, x0 stlxr w3, x2, [x1] cbnz w3, .L2 dmb ish mov x0, x2 ret ----------------------------------------------- 2) with patch .L2: ldaxr x2, [x1] add x2, x2, x0 stlxr w3, x2, [x1] cbnz w3, .L2 mov x0, x2 ret From felix.yang at huawei.com Mon Nov 11 12:44:03 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 11 Nov 2019 12:44:03 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> Message-ID: > -----Original Message----- > From: Yangfei (Felix) > Sent: Monday, November 11, 2019 8:01 PM > To: 'Andrew Haley' ; aarch64-port-dev at openjdk.java.net > Cc: 'hotspot-dev at openjdk.java.net' > Subject: RE: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > > -----Original Message----- > > From: Andrew Haley [mailto:aph at redhat.com] > > Sent: Monday, November 11, 2019 7:17 PM > > To: Yangfei (Felix) ; > > aarch64-port-dev at openjdk.java.net > > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of > > atomic operations > > > > On 11/5/19 6:20 AM, Yangfei (Felix) wrote: > > > Please review this small improvements of aarch64 atomic operations. > > > This eliminates the use of full memory barriers. > > > Passed tier1-3 testing. > > > > No, rejected. > > > > Patch also must go to hotspot-dev. > > CCing to hotspot-dev. > > > Are you sure this is safe? The HotSpot internal barriers are specified > > as being full two-way barriers, which these are not. Tier1 testing > > really isn't going to do it. Now, you might argue that none of the > > uses in HotSpot actually require anything stronger that acq/rel, but good luck > proving that. > > I was also curious about the reason why full memory barrier is used here. > For add_and_fetch, I was thinking that there is no difference in functionality for > the following two code snippet. > It's interesting to know that this may make a difference. Can you elaborate > more on that please? > > 1) without patch > .L2: > ldxr x2, [x1] > add x2, x2, x0 > stlxr w3, x2, [x1] > cbnz w3, .L2 > dmb ish > mov x0, x2 > ret > ----------------------------------------------- > 2) with patch > .L2: > ldaxr x2, [x1] > add x2, x2, x0 > stlxr w3, x2, [x1] > cbnz w3, .L2 > mov x0, x2 > ret And looks like the aarch64 port from Oracle also did the same thing: http://hg.openjdk.java.net/jdk-updates/jdk11u-dev/file/f8b2e95a1d41/src/hotspot/os_cpu/linux_arm/atomic_linux_arm.hpp template struct Atomic::PlatformAdd : Atomic::AddAndFetch > { template D add_and_fetch(I add_value, D volatile* dest, atomic_memory_order order) const; }; template<> template inline D Atomic::PlatformAdd<4>::add_and_fetch(I add_value, D volatile* dest, atomic_memory_order order) const { STATIC_ASSERT(4 == sizeof(I)); STATIC_ASSERT(4 == sizeof(D)); #ifdef AARCH64 D val; int tmp; __asm__ volatile( "1:\n\t" " ldaxr %w[val], [%[dest]]\n\t" " add %w[val], %w[val], %w[add_val]\n\t" " stlxr %w[tmp], %w[val], [%[dest]]\n\t" " cbnz %w[tmp], 1b\n\t" : [val] "=&r" (val), [tmp] "=&r" (tmp) : [add_val] "r" (add_value), [dest] "r" (dest) : "memory"); return val; #else return add_using_helper(os::atomic_add_func, add_value, dest); #endif } From aph at redhat.com Mon Nov 11 15:05:10 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Nov 2019 15:05:10 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> Message-ID: <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> On 11/11/19 12:01 PM, Yangfei (Felix) wrote: > I was also curious about the reason why full memory barrier is used > here. For add_and_fetch, I was thinking that there is no difference > in functionality for the following two code snippet. It's > interesting to know that this may make a difference. Can you > elaborate more on that please? For add_and_fetch the default atomic_memory_order is memory_order_conservative. I'm not sure exactly what that means, but it is stronger than SEQ_CST; it's been described as a "full barrier". __ATOMIC_ACQ_REL for this operation translates approximately to load LoadLoad|LoadStore add StoreStore|LoadStore store In other words, there is nothing to prevent subsequent stores being reordered with this store. Therefore your change does not meet the specification for memory_order_conservative. You could, if you wanted, only make this change for weaker memory orderings, but AFAIK they are not used. You could argue that AArch64 won't do such a reordering, but I'd reply that even if AArch64 can't do such a reordering, GCC sure can. And finally, is there any operation in HotSpot that actually requires such strong memory semantics? Probably not, but no-one has ever been brave enough to say so. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Nov 11 15:06:23 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Nov 2019 15:06:23 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> Message-ID: <98122035-8872-c77d-5309-b68f07dcaddb@redhat.com> On 11/11/19 12:44 PM, Yangfei (Felix) wrote: > And looks like the aarch64 port from Oracle also did the same thing: > http://hg.openjdk.java.net/jdk-updates/jdk11u-dev/file/f8b2e95a1d41/src/hotspot/os_cpu/linux_arm/atomic_linux_arm.hpp That's not the same thing at all, it's fully SEQ_CST. Which is almost certainly enough, but still doesn't meet spec. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Nov 11 16:36:38 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Nov 2019 16:36:38 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> Message-ID: <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> On 11/11/19 3:05 PM, Andrew Haley wrote: > And finally, is there any operation in HotSpot that actually requires > such strong memory semantics? Probably not, but no-one has ever been > brave enough to say so. Here's a place where it really does matter. void ShenandoahPacer::restart_with(size_t non_taxable_bytes, double tax_rate) { size_t initial = (size_t)(non_taxable_bytes * tax_rate) >> LogHeapWordSize; STATIC_ASSERT(sizeof(size_t) <= sizeof(intptr_t)); Atomic::xchg((intptr_t)initial, &_budget); Atomic::store(tax_rate, &_tax_rate); Atomic::inc(&_epoch); Note: the xchg is conservative, the store is plain. The xchg value should be visible before the store. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Mon Nov 11 17:11:28 2019 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Mon, 11 Nov 2019 18:11:28 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: Message-ID: Hi Felix, Would uou mind pasting a link to the proposed change? I can not determine its validity otherwise. Thanks, /Erik > On 11 Nov 2019, at 13:01, Yangfei (Felix) wrote: > > ? >> >> -----Original Message----- >> From: Andrew Haley [mailto:aph at redhat.com] >> Sent: Monday, November 11, 2019 7:17 PM >> To: Yangfei (Felix) ; >> aarch64-port-dev at openjdk.java.net >> Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic >> operations >> >>> On 11/5/19 6:20 AM, Yangfei (Felix) wrote: >>> Please review this small improvements of aarch64 atomic operations. >>> This eliminates the use of full memory barriers. >>> Passed tier1-3 testing. >> >> No, rejected. >> >> Patch also must go to hotspot-dev. > > CCing to hotspot-dev. > >> Are you sure this is safe? The HotSpot internal barriers are specified as being >> full two-way barriers, which these are not. Tier1 testing really isn't going to do >> it. Now, you might argue that none of the uses in HotSpot actually require >> anything stronger that acq/rel, but good luck proving that. > > I was also curious about the reason why full memory barrier is used here. > For add_and_fetch, I was thinking that there is no difference in functionality for the following two code snippet. > It's interesting to know that this may make a difference. Can you elaborate more on that please? > > 1) without patch > .L2: > ldxr x2, [x1] > add x2, x2, x0 > stlxr w3, x2, [x1] > cbnz w3, .L2 > dmb ish > mov x0, x2 > ret > ----------------------------------------------- > 2) with patch > .L2: > ldaxr x2, [x1] > add x2, x2, x0 > stlxr w3, x2, [x1] > cbnz w3, .L2 > mov x0, x2 > ret From aph at redhat.com Mon Nov 11 17:53:06 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Nov 2019 17:53:06 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: Message-ID: <4455d529-0f43-e6ba-d3d8-2639f4d79802@redhat.com> On 11/11/19 5:11 PM, Erik ?sterlund wrote: > Hi Felix, > > Would uou mind pasting a link to the proposed change? I can not determine its validity otherwise. Patch: diff -r 2700c409ff10 src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp --- a/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Sun Nov 03 18:02:29 2019 -0500 +++ b/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 06 14:13:00 2019 +0800 @@ -40,8 +40,7 @@ { template D add_and_fetch(I add_value, D volatile* dest, atomic_memory_order order) const { - D res = __atomic_add_fetch(dest, add_value, __ATOMIC_RELEASE); - FULL_MEM_BARRIER; + D res = __atomic_add_fetch(dest, add_value, __ATOMIC_ACQ_REL); return res; } }; @@ -52,8 +51,7 @@ T volatile* dest, atomic_memory_order order) const { STATIC_ASSERT(byte_size == sizeof(T)); - T res = __sync_lock_test_and_set(dest, exchange_value); - FULL_MEM_BARRIER; + T res = __atomic_exchange_n(dest, exchange_value, __ATOMIC_ACQ_REL); return res; } -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ci_notify at linaro.org Tue Nov 12 02:20:29 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Tue, 12 Nov 2019 02:20:29 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <789911631.346.1573525230014.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/315/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/07 pass: 5,750 Build 1: aarch64/2019/oct/09 pass: 5,747; fail: 1 Build 2: aarch64/2019/oct/11 pass: 5,751; fail: 1 Build 3: aarch64/2019/oct/14 pass: 5,753 Build 4: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 5: aarch64/2019/oct/18 pass: 5,760 Build 6: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 7: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 8: aarch64/2019/oct/28 pass: 5,766 Build 9: aarch64/2019/oct/30 pass: 5,768 Build 10: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 11: aarch64/2019/nov/04 pass: 5,769 Build 12: aarch64/2019/nov/06 pass: 5,766; fail: 2 Build 13: aarch64/2019/nov/08 pass: 5,761 Build 14: aarch64/2019/nov/11 pass: 5,762 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/07 pass: 8,683; fail: 517; error: 18 Build 1: aarch64/2019/oct/09 pass: 8,692; fail: 507; error: 21 Build 2: aarch64/2019/oct/11 pass: 8,693; fail: 511; error: 18 Build 3: aarch64/2019/oct/14 pass: 8,706; fail: 497; error: 20 Build 4: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 5: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 6: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 7: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 8: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 9: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 10: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 11: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 Build 12: aarch64/2019/nov/06 pass: 8,775; fail: 507; error: 19 Build 13: aarch64/2019/nov/08 pass: 8,774; fail: 510; error: 17 Build 14: aarch64/2019/nov/11 pass: 8,777; fail: 509; error: 15 2 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/07 pass: 3,979 Build 1: aarch64/2019/oct/09 pass: 3,979 Build 2: aarch64/2019/oct/11 pass: 3,979 Build 3: aarch64/2019/oct/14 pass: 3,979 Build 4: aarch64/2019/oct/16 pass: 3,979 Build 5: aarch64/2019/oct/18 pass: 3,979 Build 6: aarch64/2019/oct/21 pass: 3,979 Build 7: aarch64/2019/oct/23 pass: 3,980 Build 8: aarch64/2019/oct/28 pass: 3,980 Build 9: aarch64/2019/oct/30 pass: 3,980 Build 10: aarch64/2019/nov/01 pass: 3,980 Build 11: aarch64/2019/nov/04 pass: 3,980 Build 12: aarch64/2019/nov/06 pass: 3,980 Build 13: aarch64/2019/nov/08 pass: 3,980 Build 14: aarch64/2019/nov/11 pass: 3,980 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.80x Relative performance: Server critical-jOPS (nc): 9.37x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 207.57 Server 207.57 / Server 2014-04-01 (71.00): 2.92x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-09-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/266/results/ 2019-10-08 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/280/results/ 2019-10-10 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/282/results/ 2019-10-12 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/284/results/ 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ 2019-11-07 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/310/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/315/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From felix.yang at huawei.com Tue Nov 12 02:57:37 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 12 Nov 2019 02:57:37 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> Message-ID: > On 11/11/19 3:05 PM, Andrew Haley wrote: > > And finally, is there any operation in HotSpot that actually requires > > such strong memory semantics? Probably not, but no-one has ever been > > brave enough to say so. > > Here's a place where it really does matter. > > void ShenandoahPacer::restart_with(size_t non_taxable_bytes, double > tax_rate) { > size_t initial = (size_t)(non_taxable_bytes * tax_rate) >> LogHeapWordSize; > STATIC_ASSERT(sizeof(size_t) <= sizeof(intptr_t)); > Atomic::xchg((intptr_t)initial, &_budget); > Atomic::store(tax_rate, &_tax_rate); > Atomic::inc(&_epoch); > > Note: the xchg is conservative, the store is plain. The xchg value should be > visible before the store. Thanks for explaining this. I see your point now. For memory_order_conservative order, looks like that ppc enforced an order which is stronger than aarch64. ppc issues two full memory barriers: one before the loop and one after the loop. But for aarch64, the preceding load/store can still floating after the first ldxr instruction : .L2: ldxr x2, [x1] add x2, x2, x0 stlxr w3, x2, [x1] cbnz w3, .L2 dmb ish So my question is: for "two-way memory barrier", do we need another full barrier before the loop? Felix From felix.yang at huawei.com Tue Nov 12 07:36:57 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 12 Nov 2019 07:36:57 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port Message-ID: Hi, I am witnessing some SIGILL jvm crashes on my aarch64 platform. I looked at the ISB usage, especially this one: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2014-September/001376.html One of changes is adding one ISB after the native call returns: 1100 static void rt_call(MacroAssembler* masm, address dest, int gpargs, int fpargs, int type) { 1101 CodeBlob *cb = CodeCache::find_blob(dest); 1102 if (cb) { 1103 __ far_call(RuntimeAddress(dest)); 1104 } else { 1105 assert((unsigned)gpargs < 256, "eek!"); 1106 assert((unsigned)fpargs < 32, "eek!"); 1107 __ lea(rscratch1, RuntimeAddress(dest)); 1108 __ blr(rscratch1); 1109 __ maybe_isb(); < ======== 1110 } 1111 } The rt_call function is used in generate_native_wrapper to make the JNI call. As I didn?t see the barrier for the ppc or arm port. I would like to know more details here. Does anyone still remember? Also the ISB is planted only in the else block. I assume this is also necessary for the if block. Correct? Thanks for your help, Felix From felix.yang at huawei.com Tue Nov 12 08:37:02 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 12 Nov 2019 08:37:02 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> Message-ID: > -----Original Message----- > From: Yangfei (Felix) > Sent: Tuesday, November 12, 2019 10:58 AM > To: 'Andrew Haley' ; aarch64-port-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: RE: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > > On 11/11/19 3:05 PM, Andrew Haley wrote: > > > And finally, is there any operation in HotSpot that actually > > > requires such strong memory semantics? Probably not, but no-one has > > > ever been brave enough to say so. > > > > Here's a place where it really does matter. > > > > void ShenandoahPacer::restart_with(size_t non_taxable_bytes, double > > tax_rate) { > > size_t initial = (size_t)(non_taxable_bytes * tax_rate) >> > LogHeapWordSize; > > STATIC_ASSERT(sizeof(size_t) <= sizeof(intptr_t)); > > Atomic::xchg((intptr_t)initial, &_budget); > > Atomic::store(tax_rate, &_tax_rate); > > Atomic::inc(&_epoch); > > > > Note: the xchg is conservative, the store is plain. The xchg value > > should be visible before the store. > > Thanks for explaining this. I see your point now. > For memory_order_conservative order, looks like that ppc enforced an order > which is stronger than aarch64. > ppc issues two full memory barriers: one before the loop and one after the > loop. > But for aarch64, the preceding load/store can still floating after the first ldxr > instruction : > > .L2: > ldxr x2, [x1] > add x2, x2, x0 > stlxr w3, x2, [x1] > cbnz w3, .L2 > dmb ish > > So my question is: for "two-way memory barrier", do we need another full > barrier before the loop? This has been discussed somewhere before: https://patchwork.kernel.org/patch/3575821/ Let's keep the current status for safe. Felix From aph at redhat.com Tue Nov 12 09:25:18 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Nov 2019 09:25:18 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> Message-ID: <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> On 11/12/19 8:37 AM, Yangfei (Felix) wrote: > This has been discussed somewhere before: https://patchwork.kernel.org/patch/3575821/ > Let's keep the current status for safe. Yes. It's been interesting to see the progress of this patch. I don't think it's the first time that someone has been tempted to change this code to make it "more efficient". I wonder if we could perhaps add a comment to that code so that it doesn't happen again. I'm not sure exactly what the patch should say beyond "do not touch". Perhaps something along the lines of "Do not touch this code unless you have at least Black Belt, 4th Dan in memory ordering." :-) More seriously, maybe simply "Note that memory_order_conservative requires a full barrier after atomic stores. See https://patchwork.kernel.org/patch/3575821/" -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Joshua.Zhu at arm.com Tue Nov 12 09:31:35 2019 From: Joshua.Zhu at arm.com (Joshua Zhu (Arm Technology China)) Date: Tue, 12 Nov 2019 09:31:35 +0000 Subject: [aarch64-port-dev ] RFR: 8233948: AArch64: Incorrect mapping between OptoReg and VMReg for high 64 bits of Vector Register Message-ID: Hi, Please review the following patch: JBS: https://bugs.openjdk.java.net/browse/JDK-8233948 Webrev: http://cr.openjdk.java.net/~jzhu/8233948/webrev.00/ In register definition of aarch64.ad, each vector register is defined as 4 slots with its calling convention, ideal type, ... and its VMReg value. These VMReg values in reg_def are used by ADLC to generate mapping between OptoReg and VMReg: opto2vm[]. But VMReg is treated as 2 slots inconsistently for vector register [1]. This causes incorrect mapping between VMReg and OptoReg for high 64 bits of vector register. If we write the following codes which will access high 64 bits of vector register in a way like vector_calling_convention in panama branch [2]: VMReg vmreg = v0->as_VMReg(); VMRegPair p; p.set_pair(vmreg->next(3), vmreg); And convert the VMRegPair into OptoReg [3]: Regmask rm; OptoReg::Name reg_fst = OptoReg::as_OptoReg(p.first()); OptoReg::Name reg_snd = OptoReg::as_OptoReg(p.second()); tty->print("fst=%d snd=%d\n", reg_fst, reg_snd); for (OptoReg::Name r = reg_fst; r <= reg_snd; r++) { rm->Insert(r); } In this case, for V0's VMRegPair, first VMReg's value is 64 and second one is 67. After conversion by as_OptoReg(), first OptoReg becomes 124 and second one becomes 129. Then totally 6 bits of RegMask are set incorrectly, should be 4 bits (represent 4 slots/halves). VMReg, opto2vm[] and vm2opto[] are dumped by [4] as below for reference: http://cr.openjdk.java.net/~jzhu/8233948/RegDump_before_change.log opto2vm[] has the following items: OptoReg: 126, VMReg: 66 OptoReg: 127, VMReg: 67 OptoReg: 128, VMReg: 66 OptoReg: 129, VMReg: 67 OptoReg pair [126, 127] and [128, 129] are both mapped to the same VMReg Pair [66, 67]. vm2opto are then generated by traverse of opto2vm [5]. VMReg: 66, OptoReg: 128 VMReg: 67, OptoReg: 129 This caused incorrect RegMask generated in above case. However for floating-point register, bottom 64 bits of NEON vector register overlaps with floating-point register. Their VMReg and corresponding mapping is still consistent, therefore this issue is not exposed. But I think we should still fix it to make the codes clean and avoid potential issue in future. After fix, the dump is: http://cr.openjdk.java.net/~jzhu/8233948/RegDump_after_change.log [1] https://hg.openjdk.java.net/jdk/jdk/file/d595f1faace2/src/hotspot/cpu/aarch64/vmreg_aarch64.inline.hpp#l35 [2] https://hg.openjdk.java.net/panama/dev/file/43bc39c09590/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#l1140 [3] https://hg.openjdk.java.net/panama/dev/file/43bc39c09590/src/hotspot/share/opto/matcher.cpp#l1360 [4] http://cr.openjdk.java.net/~jzhu/8233948/dump.patch [5] https://hg.openjdk.java.net/jdk/jdk/file/d595f1faace2/src/hotspot/share/opto/c2compiler.cpp#l59 Best Regards, Joshua From adinn at redhat.com Tue Nov 12 09:42:09 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 12 Nov 2019 09:42:09 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> Message-ID: <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> On 12/11/2019 09:25, Andrew Haley wrote: > On 11/12/19 8:37 AM, Yangfei (Felix) wrote: >> This has been discussed somewhere before: https://patchwork.kernel.org/patch/3575821/ >> Let's keep the current status for safe. > > Yes. > > It's been interesting to see the progress of this patch. I don't think > it's the first time that someone has been tempted to change this code > to make it "more efficient". > > I wonder if we could perhaps add a comment to that code so that it > doesn't happen again. I'm not sure exactly what the patch should say > beyond "do not touch". Perhaps something along the lines of "Do not > touch this code unless you have at least Black Belt, 4th Dan in memory > ordering." :-) > > More seriously, maybe simply "Note that memory_order_conservative > requires a full barrier after atomic stores. See > https://patchwork.kernel.org/patch/3575821/" Yes, that would be a help. It's particularly easy to get confused here because we happily omit the ordering of an stlr store wrt subsequent stores when the strl is implementing a Java volatile write or a Java cmpxchg. So, it might be worth adding a rider that implementing the full memory_order_conservative semantics is necessary because VM code relies on the strong ordering wrt writes that the cmpxchg is required to provide. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From patrick at os.amperecomputing.com Tue Nov 12 09:52:04 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Tue, 12 Nov 2019 09:52:04 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: References: <9248d1b3-7341-68b9-7dd1-02dfd48567a6@bell-sw.com> Message-ID: Ping... Hi Aleksei, Does the potential regression on my test system still exist? with the new patch webrev.03? If the added largeLoopExitCondition condition excluded your >128 chars strings from the large loop, and still caused performance drops, maybe hacking all the software prefetch hint distance and the checking condition to hardcoded 64, can be a good try. Although I think it would not be a right thing to do, in comparison with the similar logic in generate_compare_long_string_different_encoding. Thanks. http://cr.openjdk.java.net/~qpzhang/8229351/webrev.03/jdk.changeset if (SoftwarePrefetchHintDistance >= 0) { - __ bind(LARGE_LOOP_PREFETCH); + if (remainingLimit < largeLoopExitCondition) { + // there could be fewer bytes left and invalid for this large loop with prefetching + __ subs(rscratch2, cnt2, largeLoopExitCondition); // => subs(rscratch2, cnt2, 64); ?? + __ br(__ LT, NO_PREFETCH); + } + __ bind(LARGE_LOOP_PREFETCH); // 64 bytes loop __ prfm(Address(str1, SoftwarePrefetchHintDistance)); // => __ prfm(Address(str1, 64)); __ prfm(Address(str2, SoftwarePrefetchHintDistance)); // => __ prfm(Address(str2, 64)); Regards Patrick -----Original Message----- From: aarch64-port-dev On Behalf Of Patrick Zhang OS Sent: Thursday, November 7, 2019 6:56 PM To: Aleksei Voitylov Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable Hi Aleksei, Thanks for testing it and the data. I only had the source of StringCompareBench.java [6], my numbers (the diffs) are within 2%, while the -207.12% looks quite weird. I initially did not add the condition to control the br, since generate_ compare_long_string_different_encoding has the similar unconditional br. By the way, the original logic allowed prefetching the memory behind array border, for the first 64 bytes. I think securing the prefetch is the right thing to do, but it could certainly stop some cases from going to the large loop with prefetching. Welcome further comments, thanks. http://cr.openjdk.java.net/~qpzhang/8229351/webrev.03/ Regards Patrick From: Aleksei Voitylov Sent: Thursday, November 7, 2019 12:53 AM To: Patrick Zhang OS Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable Hi Patrick, I like the fact that this patch does not add much to the complexity of the code. Here are some experiments that you could find useful. Cortex A73 Size base (ns/op) patched (ns/op) Diff StringCompareBench.StringCompareLL 256 14422257,98 15302300,24 -6,10% StringCompareBench.StringCompareLL 512 27998036,21 28317818,08 -1,14% ThunderX2 Size base (ns/op) patched (ns/op) Diff StringCompareBench.StringCompareLL 128 4265122,232 13099099,67 -207,12% StringCompareBench.StringCompareLL 256 3539452,533 3599407,432 -1,69% StringCompareBench.StringCompareUU 128 6899938,75 7174601,241 -3,98% StringCompareBench.StringCompareUU 256 7654538,841 7826599,466 -2,25% StringCompareBench.cachedStringCompareLL 128 19,673 21,242 -7,98% StringCompareBench.cachedStringCompareLL 256 34,179 36,452 -6,65% StringCompareBench.cachedStringCompareLL 512 59,574 64,088 -7,58% StringCompareBench.cachedStringCompareLL 1024 110,37 118,477 -7,35% StringCompareBench.cachedStringCompareLL 1000000 114028,907 115388,681 -1,19% StringCompareBench.cachedStringCompareUU 128 33,752 36,922 -9,39% StringCompareBench.cachedStringCompareUU 256 60,939 64,096 -5,18% StringCompareBench.cachedStringCompareUU 512 115,328 118,48 -2,73% StringCompareBench.cachedStringCompareUU 1024 239,332 242,97 -1,52% StringCompareBench.cachedStringCompareUU 1000000 226491,096 233638,328 -3,16% It might be the case that the newly added branch is the culprit: + __ subs(rscratch2, cnt2, largeLoopExitCondition); + __ br(__ LT, NO_PREFETCH); Maybe you could skip it when CompareLongStringLimitLatin and CompareLongStringLimitUTF are large enough (then stub code is only called with string length large enough to skip branch above). Then (the properly commented) code would look like: if ((stub_threshold-wordSize/(isLL ? 1 : 2)) < largeLoopExitCondition) { __ subs(rscratch2, cnt2, largeLoopExitCondition); __ br(__ LT, NO_PREFETCH); } and in this case we shouldn't see any performance penalties. -Aleksei On 29/10/2019 12:58, Patrick Zhang OS wrote: Hi, Could you please review this patch, thanks. JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 Webrev: http://cr.openjdk.java.net/~qpzhang/8229351/webrev.02 (this starts from .02 since there had been some internal review and updates) Changes: 1. Split the STUB_THRESHOLD from the hard-coded 72 to be CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more flexible control over the stub thresholds for string_compare intrinsics, especially for various uArchs. 2. MacroAssembler::string_compare LL and UU shared the same threshold, actually UU may only require the half (length of chars) of that of LL's, because one character has two-bytes for UU, while for compacted LL strings, one character means one byte. In addition, LU/UL may need a separated threshold, as the stub function is different from the same encoding one, and the performance may vary as well. 3. In generate_compare_long_string_same_encoding, the hard-coded 72 was originally able to ensure that there can be always 64 bytes at least for the prefetch code path. However once a smaller stub threshold is set, a new condition is needed to tell if this would be still valid, or has to go to the NO_PREFETCH branch. This change can ensure the correctness. 4. In generate_compare_long_string_different_encoding, some temp vars for handling the last 4 characters are not valid any longer, cleaned up strU and strL, and related pointers initialization to the next U (cnt1) and L (tmp2). 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not needed, as tmpU or tmpL point to the same register. Tests: 1. For function check, I have run jdk jtreg tier1 tests, with default vm flags hotspot jtreg tests: runtime/compiler/gc parts, with "-Xcomp -XX:-TieredCompilation" jck10/api/java.lang 1609 cases and other selected modules, no new failures found, with default vm flags and "-Xcomp -XX:-TieredCompilation" respectively; some specific test cases had been carefully executed to double check, i.e., TestStringCompareToDifferentLength.java [1] and TestStringCompareToDifferentLength.java [1] introduced by [2], StrCmpTest.java [3] introduced by [4]. 1. For performance check, I have run string-density-bench/CompareToBench.java [5] and StringCompareBench.java [6] respectively, and SPECjbb2015.jar, no obvious performance change has been found (since the default threshold is NOT changed within this patch). FYI. with Ampere eMAG system, microbenchmarks [5][6] can have 1.5x consistent perf gain with LU/UL comparison for shorter strings (<72 chars, smaller stub thresholds), and slight improvement (5~10%) with LL/UU cases. Refs: [1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string [2] https://bugs.openjdk.java.net/browse/JDK-8218966 AArch64: String.compareTo() can read memory after string [3] http://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java, contributed by Dmitrij Pochepko [4] https://bugs.openjdk.java.net/browse/JDK-8202326 AARCH64: optimize string compare intrinsic [5] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar, contributed by Aleksey Shipilev [6] http://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java, contributed by Dmitrij Pochepko Regards Patrick From Joshua.Zhu at arm.com Tue Nov 12 09:55:02 2019 From: Joshua.Zhu at arm.com (Joshua Zhu (Arm Technology China)) Date: Tue, 12 Nov 2019 09:55:02 +0000 Subject: [aarch64-port-dev ] RFR: 8233948: AArch64: Incorrect mapping between OptoReg and VMReg for high 64 bits of Vector Register In-Reply-To: References: Message-ID: Hi, Please review the following patch: JBS: https://bugs.openjdk.java.net/browse/JDK-8233948 Webrev: http://cr.openjdk.java.net/~jzhu/8233948/webrev.00/ In register definition of aarch64.ad, each vector register is defined as 4 slots with its calling convention, ideal type, ... and its VMReg value. These VMReg values in reg_def are used by ADLC to generate mapping between OptoReg and VMReg: opto2vm[]. But VMReg is treated as 2 slots inconsistently for vector register [1]. This causes incorrect mapping between VMReg and OptoReg for high 64 bits of vector register. If we write the following codes which will access high 64 bits of vector register in a way like vector_calling_convention in panama branch [2]: VMReg vmreg = v0->as_VMReg(); VMRegPair p; p.set_pair(vmreg->next(3), vmreg); And convert the VMRegPair into OptoReg [3]: Regmask rm; OptoReg::Name reg_fst = OptoReg::as_OptoReg(p.first()); OptoReg::Name reg_snd = OptoReg::as_OptoReg(p.second()); tty->print("fst=%d snd=%d\n", reg_fst, reg_snd); for (OptoReg::Name r = reg_fst; r <= reg_snd; r++) { rm->Insert(r); } In this case, for V0's VMRegPair, first VMReg's value is 64 and second one is 67. After conversion by as_OptoReg(), first OptoReg becomes 124 and second one becomes 129. Then totally 6 bits of RegMask are set incorrectly, should be 4 bits (represent 4 slots/halves). VMReg, opto2vm[] and vm2opto[] are dumped by [4] as below for reference: http://cr.openjdk.java.net/~jzhu/8233948/RegDump_before_change.log opto2vm[] has the following items: OptoReg: 126, VMReg: 66 OptoReg: 127, VMReg: 67 OptoReg: 128, VMReg: 66 OptoReg: 129, VMReg: 67 OptoReg pair [126, 127] and [128, 129] are both mapped to the same VMReg Pair [66, 67]. vm2opto are then generated by traverse of opto2vm [5]. VMReg: 66, OptoReg: 128 VMReg: 67, OptoReg: 129 This caused incorrect RegMask generated in above case. However for floating-point register, bottom 64 bits of NEON vector register overlaps with floating-point register. Their VMReg and corresponding mapping is still consistent, therefore this issue is not exposed. But I think we should still fix it to make the codes clean and avoid potential issue in future. After fix, the dump is: http://cr.openjdk.java.net/~jzhu/8233948/RegDump_after_change.log [1] https://hg.openjdk.java.net/jdk/jdk/file/d595f1faace2/src/hotspot/cpu/aarch64/vmreg_aarch64.inline.hpp#l35 [2] https://hg.openjdk.java.net/panama/dev/file/43bc39c09590/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#l1140 [3] https://hg.openjdk.java.net/panama/dev/file/43bc39c09590/src/hotspot/share/opto/matcher.cpp#l1360 [4] http://cr.openjdk.java.net/~jzhu/8233948/dump.patch [5] https://hg.openjdk.java.net/jdk/jdk/file/d595f1faace2/src/hotspot/share/opto/c2compiler.cpp#l59 Best Regards, Joshua From felix.yang at huawei.com Tue Nov 12 12:02:34 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 12 Nov 2019 12:02:34 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Dinn [mailto:adinn at redhat.com] > Sent: Tuesday, November 12, 2019 5:42 PM > To: Andrew Haley ; Yangfei (Felix) > ; aarch64-port-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > On 12/11/2019 09:25, Andrew Haley wrote: > > On 11/12/19 8:37 AM, Yangfei (Felix) wrote: > >> This has been discussed somewhere before: > >> https://patchwork.kernel.org/patch/3575821/ > >> Let's keep the current status for safe. > > > > Yes. > > > > It's been interesting to see the progress of this patch. I don't think > > it's the first time that someone has been tempted to change this code > > to make it "more efficient". > > > > I wonder if we could perhaps add a comment to that code so that it > > doesn't happen again. I'm not sure exactly what the patch should say > > beyond "do not touch". Perhaps something along the lines of "Do not > > touch this code unless you have at least Black Belt, 4th Dan in memory > > ordering." :-) > > > > More seriously, maybe simply "Note that memory_order_conservative > > requires a full barrier after atomic stores. See > > https://patchwork.kernel.org/patch/3575821/" > Yes, that would be a help. It's particularly easy to get confused here because > we happily omit the ordering of an stlr store wrt subsequent stores when the > strl is implementing a Java volatile write or a Java cmpxchg. > > So, it might be worth adding a rider that implementing the full > memory_order_conservative semantics is necessary because VM code relies > on the strong ordering wrt writes that the cmpxchg is required to provide. > I also suggest we implement these functions with inline assembly here. For Atomic::PlatformXchg, we may issue two consecutive full memory barriers with the current status. I used GCC 7.3.0 to compile the following function: $ cat test.c long foo(long add_value, long volatile* dest, long exchange_value) { long val = __sync_lock_test_and_set(dest, exchange_value); __sync_synchronize(); return val; } $ cat test.s .arch armv8-a .file "test.c" .text .align 2 .p2align 3,,7 .global foo .type foo, %function foo: .L2: ldxr x0, [x1] stxr w3, x2, [x1] cbnz w3, .L2 dmb ish < ======== dmb ish < ======== ret .size foo, .-foo .ident "GCC: (GNU) 7.3.0" .section .note.GNU-stack,"", at progbits From felix.yang at huawei.com Tue Nov 12 12:14:48 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 12 Nov 2019 12:14:48 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> Message-ID: > -----Original Message----- > From: Yangfei (Felix) > Sent: Tuesday, November 12, 2019 8:03 PM > To: 'Andrew Dinn' ; Andrew Haley ; > aarch64-port-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: RE: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > > -----Original Message----- > > From: Andrew Dinn [mailto:adinn at redhat.com] > > Sent: Tuesday, November 12, 2019 5:42 PM > > To: Andrew Haley ; Yangfei (Felix) > > ; aarch64-port-dev at openjdk.java.net > > Cc: hotspot-dev at openjdk.java.net > > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of > > atomic operations > > > > On 12/11/2019 09:25, Andrew Haley wrote: > > > On 11/12/19 8:37 AM, Yangfei (Felix) wrote: > > >> This has been discussed somewhere before: > > >> https://patchwork.kernel.org/patch/3575821/ > > >> Let's keep the current status for safe. > > > > > > Yes. > > > > > > It's been interesting to see the progress of this patch. I don't > > > think it's the first time that someone has been tempted to change > > > this code to make it "more efficient". > > > > > > I wonder if we could perhaps add a comment to that code so that it > > > doesn't happen again. I'm not sure exactly what the patch should say > > > beyond "do not touch". Perhaps something along the lines of "Do not > > > touch this code unless you have at least Black Belt, 4th Dan in > > > memory ordering." :-) > > > > > > More seriously, maybe simply "Note that memory_order_conservative > > > requires a full barrier after atomic stores. See > > > https://patchwork.kernel.org/patch/3575821/" > > Yes, that would be a help. It's particularly easy to get confused here > > because we happily omit the ordering of an stlr store wrt subsequent > > stores when the strl is implementing a Java volatile write or a Java cmpxchg. > > > > So, it might be worth adding a rider that implementing the full > > memory_order_conservative semantics is necessary because VM code > > relies on the strong ordering wrt writes that the cmpxchg is required to > provide. > > > > I also suggest we implement these functions with inline assembly here. > For Atomic::PlatformXchg, we may issue two consecutive full memory barriers > with the current status. > I used GCC 7.3.0 to compile the following function: > > $ cat test.c > long foo(long add_value, long volatile* dest, long exchange_value) { > long val = __sync_lock_test_and_set(dest, exchange_value); > > __sync_synchronize(); > > return val; > } > > $ cat test.s > .arch armv8-a > .file "test.c" > .text > .align 2 > .p2align 3,,7 > .global foo > .type foo, %function > foo: > .L2: > ldxr x0, [x1] > stxr w3, x2, [x1] > cbnz w3, .L2 > dmb ish < ======== > dmb ish < ======== > ret > .size foo, .-foo > .ident "GCC: (GNU) 7.3.0" > .section .note.GNU-stack,"", at progbits Also this is different from the following sequence (stxr instead of stlxr). // atomic_op (B) 1: ldxr x0, [B] // Exclusive load stlxr w1, x0, [B] // Exclusive store with release cbnz w1, 1b dmb ish // Full barrier I think the two-way memory barrier may not be ensured for this case. Felix From felix.yang at huawei.com Tue Nov 12 14:42:55 2019 From: felix.yang at huawei.com (felix.yang at huawei.com) Date: Tue, 12 Nov 2019 14:42:55 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8u-shenandoah/hotspot: 8233839: aarch64: missing memory barrier in NewObjectArrayStub and NewTypeArrayStub Message-ID: <201911121442.xACEgtVb012981@aojmv0008.oracle.com> Changeset: 09d4b646f756 Author: fyang Date: 2019-11-12 17:54 +0800 URL: https://hg.openjdk.java.net/aarch64-port/jdk8u-shenandoah/hotspot/rev/09d4b646f756 8233839: aarch64: missing memory barrier in NewObjectArrayStub and NewTypeArrayStub Reviewed-by: adinn ! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp From aph at redhat.com Tue Nov 12 16:04:57 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Nov 2019 16:04:57 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> Message-ID: <58ba3a50-fe49-f231-85b2-37d8f8b136f0@redhat.com> On 11/12/19 12:02 PM, Yangfei (Felix) wrote: > I also suggest we implement these functions with inline assembly here. Please let's not. Long term it would be nice to migrate all of HotSpot from the current inline hackery to real C++ atomics. There has been a considerable effort to make C++ and Java memory models compatible, and we should utilize this. > For Atomic::PlatformXchg, we may issue two consecutive full memory > barriers with the current status. OK, but is this actually important? What uses it? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aleksei.voitylov at bell-sw.com Tue Nov 12 16:10:28 2019 From: aleksei.voitylov at bell-sw.com (Aleksei Voitylov) Date: Tue, 12 Nov 2019 19:10:28 +0300 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: References: <9248d1b3-7341-68b9-7dd1-02dfd48567a6@bell-sw.com> Message-ID: <4a067ad4-a7e8-813e-2633-91bb380f24b5@bell-sw.com> Hi Patrick, First, I'm a trespasser, not a reviewer. Reviewers will need to look at this. On the technical side: This additional branch in v3 is still painful. You can reduce the amount of branches in code path for lengths 128 and 256 by using that fact that CompareLongStringLimitLatin and CompareLongStringLimitUTF are at least 24. Then we don't have to jump to NO_PREFETCH label, where check for small string size is done. Instead we can jump to SMALL_LOOP label (assuming cnt2 counter is updated accordingly). In this case NO_PREFETCH label is not needed and we have 1 less branch. Rough sketch of affected part based on v3 looks as follows. This version was checked on ThunderX2 and it looks fine on length 128 and 256 perf-wise. I also added alignment for small loop, which also helps a bit. Please keep in mind it's a sketch. -Aleksei @@ -4172,19 +4168,34 @@ ???? Register result = r0, str1 = r1, cnt1 = r2, str2 = r3, cnt2 = r4, ???????? tmp1 = r10, tmp2 = r11; ???? Label SMALL_LOOP, LARGE_LOOP_PREFETCH, CHECK_LAST, DIFF2, TAIL, -??????? LENGTH_DIFF, DIFF, LAST_CHECK_AND_LENGTH_DIFF, +??????? LENGTH_DIFF, DIFF, LAST_CHECK_AND_LENGTH_DIFF, SMALL_LOOP_CHECK, ???????? DIFF_LAST_POSITION, DIFF_LAST_POSITION2; ???? // exit from large loop when less than 64 bytes left to read or we're about ???? // to prefetch memory behind array border ???? int largeLoopExitCondition = MAX(64, SoftwarePrefetchHintDistance)/(isLL ? 1 : 2); +??? // calculate the remaining limit in chars which manages if this stub should be called, +??? // if the limit is large enough (>= largeLoopExitCondition), below large loop with prefetching +??? // can be executed at least once, and there is no need to do any extra checking at the entrance. +??? int remainingLimit = (isLL ? CompareLongStringLimitLatin : CompareLongStringLimitUTF) - +???????????????????????? (wordSize / (isLL ? 1 : 2)); ???? // cnt1/cnt2 contains amount of characters to compare. cnt1 can be re-used ???? // update cnt2 counter with already loaded 8 bytes -??? __ sub(cnt2, cnt2, wordSize/(isLL ? 1 : 2)); +??? if (SoftwarePrefetchHintDistance >= 0 && remainingLimit < largeLoopExitCondition) { +????? __ sub(cnt2, cnt2, isLL ? 24 : 12); +??? } else { +????? __ sub(cnt2, cnt2, isLL ? 8 : 4); +??? } ???? // update pointers, because of previous read ???? __ add(str1, str1, wordSize); ???? __ add(str2, str2, wordSize); ???? if (SoftwarePrefetchHintDistance >= 0) { -????? __ bind(LARGE_LOOP_PREFETCH); +????? if (remainingLimit < largeLoopExitCondition) { +??????? // there could be fewer bytes left and invalid for this large loop with prefetching +??????? __ subs(rscratch2, cnt2, largeLoopExitCondition - (isLL ? 16 : 8)); +??????? __ br(__ LT, SMALL_LOOP); +??????? __ add(cnt2, cnt2, isLL ? 16 : 8); +????? } +????? __ bind(LARGE_LOOP_PREFETCH); // 64 bytes loop ???????? __ prfm(Address(str1, SoftwarePrefetchHintDistance)); ???????? __ prfm(Address(str2, SoftwarePrefetchHintDistance)); ???????? compare_string_16_bytes_same(DIFF, DIFF2); @@ -4196,11 +4207,11 @@ ???????? __ br(__ GT, LARGE_LOOP_PREFETCH); ???????? __ cbz(cnt2, LAST_CHECK_AND_LENGTH_DIFF); // no more chars left? ???? } -??? // less than 16 bytes left? -??? __ subs(cnt2, cnt2, isLL ? 16 : 8); -??? __ br(__ LT, TAIL); +??? __ b(SMALL_LOOP_CHECK); // check if less than 16 bytes left +??? __ align(OptoLoopAlignment); ???? __ bind(SMALL_LOOP); ?????? compare_string_16_bytes_same(DIFF, DIFF2); +????? __ bind(SMALL_LOOP_CHECK); ?????? __ subs(cnt2, cnt2, isLL ? 16 : 8); ?????? __ br(__ GE, SMALL_LOOP); ???? __ bind(TAIL); On 12/11/2019 12:52, Patrick Zhang OS wrote: > Ping... > > Hi Aleksei, > > Does the potential regression on my test system still exist? with the new patch webrev.03? If the added largeLoopExitCondition condition excluded your >128 chars strings from the large loop, and still caused performance drops, maybe hacking all the software prefetch hint distance and the checking condition to hardcoded 64, can be a good try. Although I think it would not be a right thing to do, in comparison with the similar logic in generate_compare_long_string_different_encoding. Thanks. > > http://cr.openjdk.java.net/~qpzhang/8229351/webrev.03/jdk.changeset > if (SoftwarePrefetchHintDistance >= 0) { > - __ bind(LARGE_LOOP_PREFETCH); > + if (remainingLimit < largeLoopExitCondition) { > + // there could be fewer bytes left and invalid for this large loop with prefetching > + __ subs(rscratch2, cnt2, largeLoopExitCondition); // => subs(rscratch2, cnt2, 64); ?? > + __ br(__ LT, NO_PREFETCH); > + } > + __ bind(LARGE_LOOP_PREFETCH); // 64 bytes loop > __ prfm(Address(str1, SoftwarePrefetchHintDistance)); // => __ prfm(Address(str1, 64)); > __ prfm(Address(str2, SoftwarePrefetchHintDistance)); // => __ prfm(Address(str2, 64)); > > Regards > Patrick > > -----Original Message----- > From: aarch64-port-dev On Behalf Of Patrick Zhang OS > Sent: Thursday, November 7, 2019 6:56 PM > To: Aleksei Voitylov > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable > > Hi Aleksei, > > Thanks for testing it and the data. I only had the source of StringCompareBench.java [6], my numbers (the diffs) are within 2%, while the -207.12% looks quite weird. I initially did not add the condition to control the br, since generate_ compare_long_string_different_encoding has the similar unconditional br. By the way, the original logic allowed prefetching the memory behind array border, for the first 64 bytes. I think securing the prefetch is the right thing to do, but it could certainly stop some cases from going to the large loop with prefetching. Welcome further comments, thanks. > > http://cr.openjdk.java.net/~qpzhang/8229351/webrev.03/ > > Regards > Patrick > > From: Aleksei Voitylov > Sent: Thursday, November 7, 2019 12:53 AM > To: Patrick Zhang OS > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable > > > Hi Patrick, > > I like the fact that this patch does not add much to the complexity of the code. Here are some experiments that you could find useful. > Cortex A73 Size base (ns/op) patched (ns/op) Diff > StringCompareBench.StringCompareLL 256 14422257,98 15302300,24 -6,10% > StringCompareBench.StringCompareLL 512 27998036,21 28317818,08 -1,14% > > ThunderX2 Size base (ns/op) patched (ns/op) Diff > StringCompareBench.StringCompareLL 128 4265122,232 13099099,67 -207,12% > StringCompareBench.StringCompareLL 256 3539452,533 3599407,432 -1,69% > > StringCompareBench.StringCompareUU 128 6899938,75 7174601,241 -3,98% > StringCompareBench.StringCompareUU 256 7654538,841 7826599,466 -2,25% > > StringCompareBench.cachedStringCompareLL 128 19,673 21,242 -7,98% > StringCompareBench.cachedStringCompareLL 256 34,179 36,452 -6,65% > StringCompareBench.cachedStringCompareLL 512 59,574 64,088 -7,58% > StringCompareBench.cachedStringCompareLL 1024 110,37 118,477 -7,35% > StringCompareBench.cachedStringCompareLL 1000000 114028,907 115388,681 -1,19% > > StringCompareBench.cachedStringCompareUU 128 33,752 36,922 -9,39% > StringCompareBench.cachedStringCompareUU 256 60,939 64,096 -5,18% > StringCompareBench.cachedStringCompareUU 512 115,328 118,48 -2,73% > StringCompareBench.cachedStringCompareUU 1024 239,332 242,97 -1,52% > StringCompareBench.cachedStringCompareUU 1000000 226491,096 233638,328 -3,16% > It might be the case that the newly added branch is the culprit: > > + __ subs(rscratch2, cnt2, largeLoopExitCondition); > + __ br(__ LT, NO_PREFETCH); > > Maybe you could skip it when CompareLongStringLimitLatin and CompareLongStringLimitUTF are large enough (then stub code is only called with string length large enough to skip branch above). Then (the properly commented) code would look like: > > if ((stub_threshold-wordSize/(isLL ? 1 : 2)) < largeLoopExitCondition) { > __ subs(rscratch2, cnt2, largeLoopExitCondition); > __ br(__ LT, NO_PREFETCH); > } > > and in this case we shouldn't see any performance penalties. > > -Aleksei > > On 29/10/2019 12:58, Patrick Zhang OS wrote: > > Hi, > > > > Could you please review this patch, thanks. > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 > > Webrev: http://cr.openjdk.java.net/~qpzhang/8229351/webrev.02 > > (this starts from .02 since there had been some internal review and updates) > > > > Changes: > > > > 1. Split the STUB_THRESHOLD from the hard-coded 72 to be CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more flexible control over the stub thresholds for string_compare intrinsics, especially for various uArchs. > > > > 2. MacroAssembler::string_compare LL and UU shared the same threshold, actually UU may only require the half (length of chars) of that of LL's, because one character has two-bytes for UU, while for compacted LL strings, one character means one byte. In addition, LU/UL may need a separated threshold, as the stub function is different from the same encoding one, and the performance may vary as well. > > > > 3. In generate_compare_long_string_same_encoding, the hard-coded 72 was originally able to ensure that there can be always 64 bytes at least for the prefetch code path. However once a smaller stub threshold is set, a new condition is needed to tell if this would be still valid, or has to go to the NO_PREFETCH branch. This change can ensure the correctness. > > > > 4. In generate_compare_long_string_different_encoding, some temp vars for handling the last 4 characters are not valid any longer, cleaned up strU and strL, and related pointers initialization to the next U (cnt1) and L (tmp2). > > > > 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not needed, as tmpU or tmpL point to the same register. > > > > Tests: > > > > 1. For function check, I have run > > > > jdk jtreg tier1 tests, with default vm flags > > > > hotspot jtreg tests: runtime/compiler/gc parts, with "-Xcomp -XX:-TieredCompilation" > > > > jck10/api/java.lang 1609 cases and other selected modules, no new failures found, with default vm flags and "-Xcomp -XX:-TieredCompilation" respectively; > > > > some specific test cases had been carefully executed to double check, i.e., TestStringCompareToDifferentLength.java [1] and TestStringCompareToDifferentLength.java [1] introduced by [2], StrCmpTest.java [3] introduced by [4]. > > > > 1. For performance check, I have run > > > > string-density-bench/CompareToBench.java [5] and StringCompareBench.java [6] respectively, > > > > and SPECjbb2015.jar, no obvious performance change has been found (since the default threshold is NOT changed within this patch). > > > > FYI. with Ampere eMAG system, microbenchmarks [5][6] can have 1.5x consistent perf gain with LU/UL comparison for shorter strings (<72 chars, smaller stub thresholds), and slight improvement (5~10%) with LL/UU cases. > > > > Refs: > > [1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string > > [2] https://bugs.openjdk.java.net/browse/JDK-8218966 AArch64: String.compareTo() can read memory after string > > [3] http://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java, contributed by Dmitrij Pochepko > > [4] https://bugs.openjdk.java.net/browse/JDK-8202326 AARCH64: optimize string compare intrinsic > > [5] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar, contributed by Aleksey Shipilev > > [6] http://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java, contributed by Dmitrij Pochepko > > > > Regards > > Patrick > > From erik.osterlund at oracle.com Tue Nov 12 17:38:11 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 12 Nov 2019 18:38:11 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> Message-ID: <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> Hi Felix, I was hoping to stay out of this conversation, but couldn't resist butting in unfortunately. I have to agree with you - you are absolutely right. We have a mix of the JMM, the C++ memory model and HotSpot's memory model, which predates that. JMM and C++ memory model are indeed quite similar now in terms of semantics (yet there exists choice in implementation of it), but the old memory model used in HotSpot is kind of not similar. Ideally we would have less memory models and just go with the one used by C++/JMM, and then we just have to convince ourselves that the choice of implementation of seq_cst by the compiler is compatible to the one we use to implement the JMM in our JIT-compiled code. But it seems to me that we are not there. Last time I discussed this with Andrew Haley, we disagreed and didn't really get anywhere. Andrew wanted to use the GCC intrinsics, and I was arguing that we should use inline assembly as a) the memory model we are supporting is not the same as what the intrinsic is providing, and b) we are relying on the implementation of the intrinsics to emit very specific instruction sequences to be compatible with the memory model, and it would be more clear if we could see in the inline assembly that we indeed used exactly those instructions that we expected and not something unexpected, which we would only randomly find out when disassembling the code (ahem). Now it looks like you have discovered that we sometimes have double trailing dmb ish, and sometimes lacking leading dmb ish if I am reading this right. That seems to make the case stronger, that by looking at the intrinsic calls, it's not obvious what instruction sequence will be emitted, and whether that is compatible with the memory model it is implementing or not, and you really have to disassemble it to find out what we actually got. And it looks like what we got is not at all what we wanted. My hope is that the AArch64 port should use inline assembly as you suggest, so we can see that the generated code is correct, as we wait for the glorious future where all HotSpot code has been rewritten to work with seq_cst (and we are *not* there now). Having said that, now I will try to go and hide in a corner again... Thanks, /Erik On 2019-11-12 13:14, Yangfei (Felix) wrote: >> -----Original Message----- >> From: Yangfei (Felix) >> Sent: Tuesday, November 12, 2019 8:03 PM >> To: 'Andrew Dinn' ; Andrew Haley ; >> aarch64-port-dev at openjdk.java.net >> Cc: hotspot-dev at openjdk.java.net >> Subject: RE: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic >> operations >> >>> -----Original Message----- >>> From: Andrew Dinn [mailto:adinn at redhat.com] >>> Sent: Tuesday, November 12, 2019 5:42 PM >>> To: Andrew Haley ; Yangfei (Felix) >>> ; aarch64-port-dev at openjdk.java.net >>> Cc: hotspot-dev at openjdk.java.net >>> Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of >>> atomic operations >>> >>> On 12/11/2019 09:25, Andrew Haley wrote: >>>> On 11/12/19 8:37 AM, Yangfei (Felix) wrote: >>>>> This has been discussed somewhere before: >>>>> https://patchwork.kernel.org/patch/3575821/ >>>>> Let's keep the current status for safe. >>>> Yes. >>>> >>>> It's been interesting to see the progress of this patch. I don't >>>> think it's the first time that someone has been tempted to change >>>> this code to make it "more efficient". >>>> >>>> I wonder if we could perhaps add a comment to that code so that it >>>> doesn't happen again. I'm not sure exactly what the patch should say >>>> beyond "do not touch". Perhaps something along the lines of "Do not >>>> touch this code unless you have at least Black Belt, 4th Dan in >>>> memory ordering." :-) >>>> >>>> More seriously, maybe simply "Note that memory_order_conservative >>>> requires a full barrier after atomic stores. See >>>> https://patchwork.kernel.org/patch/3575821/" >>> Yes, that would be a help. It's particularly easy to get confused here >>> because we happily omit the ordering of an stlr store wrt subsequent >>> stores when the strl is implementing a Java volatile write or a Java cmpxchg. >>> >>> So, it might be worth adding a rider that implementing the full >>> memory_order_conservative semantics is necessary because VM code >>> relies on the strong ordering wrt writes that the cmpxchg is required to >> provide. >> I also suggest we implement these functions with inline assembly here. >> For Atomic::PlatformXchg, we may issue two consecutive full memory barriers >> with the current status. >> I used GCC 7.3.0 to compile the following function: >> >> $ cat test.c >> long foo(long add_value, long volatile* dest, long exchange_value) { >> long val = __sync_lock_test_and_set(dest, exchange_value); >> >> __sync_synchronize(); >> >> return val; >> } >> >> $ cat test.s >> .arch armv8-a >> .file "test.c" >> .text >> .align 2 >> .p2align 3,,7 >> .global foo >> .type foo, %function >> foo: >> .L2: >> ldxr x0, [x1] >> stxr w3, x2, [x1] >> cbnz w3, .L2 >> dmb ish < ======== >> dmb ish < ======== >> ret >> .size foo, .-foo >> .ident "GCC: (GNU) 7.3.0" >> .section .note.GNU-stack,"", at progbits > Also this is different from the following sequence (stxr instead of stlxr). > > > > // atomic_op (B) > 1: ldxr x0, [B] // Exclusive load > > stlxr w1, x0, [B] // Exclusive store with release > cbnz w1, 1b > dmb ish // Full barrier > > > > I think the two-way memory barrier may not be ensured for this case. > > Felix From Alan.Hayward at arm.com Tue Nov 12 18:03:02 2019 From: Alan.Hayward at arm.com (Alan Hayward) Date: Tue, 12 Nov 2019 18:03:02 +0000 Subject: [aarch64-port-dev ] RFR 8231841: AArch64: Add entry to pns output in help() Message-ID: Please could you review this change which adds AArch64 to the pns section of the help() output. Bug: https://bugs.openjdk.java.net/browse/JDK-8231841 Webrev: http://cr.openjdk.java.net/~smonteith/8231841/webrev.0/ Built and ran tier1 on x86 and AArch64. Thanks, Alan. From aph at redhat.com Tue Nov 12 19:00:20 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Nov 2019 19:00:20 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> Message-ID: <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> On 11/12/19 5:38 PM, Erik ?sterlund wrote: > My hope is that the AArch64 port should use inline assembly as you suggest, so we can see that the generated code is correct, as we wait for the glorious future where all HotSpot code has been rewritten to work with seq_cst (and we are *not* there now). I don't doubt it. :-) But my arguments about the C++ intrinsics being well-enough defined, at least on AArch64 Linux, have not changed, and I'm not going to argue all that again. I'll grant you that there may well be issues on various x86 compilers, but that isn't relevant here. > Now it looks like you have discovered that we sometimes have double trailing dmb ish, and sometimes lacking leading dmb ish if I am reading this right. That seems to make the case stronger, Sure, we can use inline asm if there's no other way to do it, but I don't think that's necessary. All we need is to use T res; __atomic_exchange(dest, &exchange_value, &res, __ATOMIC_RELEASE); FULL_MEM_BARRIER; -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ci_notify at linaro.org Tue Nov 12 19:07:37 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Tue, 12 Nov 2019 19:07:37 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 13 on AArch64 Message-ID: <738577191.460.1573585658389.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk13/openjdk-jtreg-nightly-tests/summary/2019/316/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/02 pass: 5,645; fail: 2 Build 1: aarch64/2019/jul/04 pass: 5,644; fail: 2; error: 1 Build 2: aarch64/2019/jul/09 pass: 5,643; fail: 4 Build 3: aarch64/2019/jul/16 pass: 5,646; fail: 1 Build 4: aarch64/2019/jul/18 pass: 5,644; fail: 2; error: 1 Build 5: aarch64/2019/jul/20 pass: 5,645; fail: 1; error: 1 Build 6: aarch64/2019/jul/23 pass: 5,644; fail: 3 Build 7: aarch64/2019/jul/25 pass: 5,644; fail: 3 Build 8: aarch64/2019/jul/30 pass: 5,645; fail: 2 Build 9: aarch64/2019/aug/01 pass: 5,646; fail: 1 Build 10: aarch64/2019/aug/03 pass: 5,646; fail: 1 Build 11: aarch64/2019/aug/06 pass: 5,645; fail: 2 Build 12: aarch64/2019/aug/08 pass: 5,646; fail: 1 Build 13: aarch64/2019/aug/10 pass: 5,646; fail: 1 Build 14: aarch64/2019/nov/12 pass: 5,652 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/02 pass: 8,604; fail: 521; error: 25 Build 1: aarch64/2019/jul/04 pass: 8,601; fail: 523; error: 26 Build 2: aarch64/2019/jul/09 pass: 8,606; fail: 515; error: 29 Build 3: aarch64/2019/jul/16 pass: 8,593; fail: 531; error: 30 Build 4: aarch64/2019/jul/18 pass: 8,618; fail: 527; error: 26 Build 5: aarch64/2019/jul/20 pass: 8,619; fail: 519; error: 33 Build 6: aarch64/2019/jul/23 pass: 8,616; fail: 525; error: 30 Build 7: aarch64/2019/jul/25 pass: 8,620; fail: 528; error: 23 Build 8: aarch64/2019/jul/30 pass: 8,610; fail: 529; error: 32 Build 9: aarch64/2019/aug/01 pass: 8,620; fail: 527; error: 24 Build 10: aarch64/2019/aug/03 pass: 8,596; fail: 552; error: 23 Build 11: aarch64/2019/aug/06 pass: 8,616; fail: 528; error: 27 Build 12: aarch64/2019/aug/08 pass: 8,649; fail: 504; error: 18 Build 13: aarch64/2019/aug/10 pass: 8,647; fail: 507; error: 17 Build 14: aarch64/2019/nov/12 pass: 8,650; fail: 513; error: 16 3 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/02 pass: 3,962 Build 1: aarch64/2019/jul/04 pass: 3,962 Build 2: aarch64/2019/jul/09 pass: 3,962 Build 3: aarch64/2019/jul/16 pass: 3,963 Build 4: aarch64/2019/jul/18 pass: 3,964 Build 5: aarch64/2019/jul/20 pass: 3,964 Build 6: aarch64/2019/jul/23 pass: 3,964 Build 7: aarch64/2019/jul/25 pass: 3,964 Build 8: aarch64/2019/jul/30 pass: 3,964 Build 9: aarch64/2019/aug/01 pass: 3,964 Build 10: aarch64/2019/aug/03 pass: 3,964 Build 11: aarch64/2019/aug/06 pass: 3,964 Build 12: aarch64/2019/aug/08 pass: 3,964 Build 13: aarch64/2019/aug/10 pass: 3,964 Build 14: aarch64/2019/nov/12 pass: 3,964 Previous results can be found here: http://openjdk.linaro.org/jdk13/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.54x Relative performance: Server critical-jOPS (nc): 8.89x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk13/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 204.57 Server 204.57 / Server 2014-04-01 (71.00): 2.88x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk13/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-07-03 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/183/results/ 2019-07-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/185/results/ 2019-07-10 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/190/results/ 2019-07-16 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/197/results/ 2019-07-19 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/199/results/ 2019-07-21 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/201/results/ 2019-07-24 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/204/results/ 2019-07-26 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/206/results/ 2019-07-31 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/211/results/ 2019-08-02 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/213/results/ 2019-08-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/215/results/ 2019-08-07 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/218/results/ 2019-08-09 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/220/results/ 2019-08-11 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/222/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/316/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/ From felix.yang at huawei.com Wed Nov 13 02:35:39 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 13 Nov 2019 02:35:39 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Wednesday, November 13, 2019 3:00 AM > To: Erik ?sterlund ; Yangfei (Felix) > ; Andrew Dinn ; > aarch64-port-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > On 11/12/19 5:38 PM, Erik ?sterlund wrote: > > My hope is that the AArch64 port should use inline assembly as you suggest, > so we can see that the generated code is correct, as we wait for the glorious > future where all HotSpot code has been rewritten to work with seq_cst (and we > are *not* there now). > > I don't doubt it. :-) > > But my arguments about the C++ intrinsics being well-enough defined, at least > on AArch64 Linux, have not changed, and I'm not going to argue all that again. > I'll grant you that there may well be issues on various x86 compilers, but that > isn't relevant here. Looks like I reignited an old discussion :- ) > > Now it looks like you have discovered that we sometimes have double > > trailing dmb ish, and sometimes lacking leading dmb ish if I am > > reading this right. That seems to make the case stronger, > > Sure, we can use inline asm if there's no other way to do it, but I don't think > that's necessary. All we need is to use > > T res; > __atomic_exchange(dest, &exchange_value, &res, __ATOMIC_RELEASE); > FULL_MEM_BARRIER; > When we go the C++ intrinsics way, we should also handle Atomic::PlatformCmpxchg. When I compile the following function with GCC 4.9.3: long foo(long exchange_value, long volatile* dest, long compare_value) { long val = __sync_val_compare_and_swap(dest, compare_value, exchange_value); return val; } I got: .L2: ldaxr x0, [x1] cmp x0, x2 bne .L3 stlxr w4, x3, [x1] cbnz w4, .L2 .L3: Proposed patch: diff -r 846fee5ea75e src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp --- a/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 13 10:27:06 2019 +0900 +++ b/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 13 10:14:58 2019 +0800 @@ -52,7 +52,7 @@ T volatile* dest, atomic_memory_order order) const { STATIC_ASSERT(byte_size == sizeof(T)); - T res = __sync_lock_test_and_set(dest, exchange_value); + T res = __atomic_exchange_n(dest, exchange_value, __ATOMIC_RELEASE); FULL_MEM_BARRIER; return res; } @@ -70,7 +70,11 @@ __ATOMIC_RELAXED, __ATOMIC_RELAXED); return value; } else { - return __sync_val_compare_and_swap(dest, compare_value, exchange_value); + T value = compare_value; + __atomic_compare_exchange(dest, &value, &exchange_value, /*weak*/false, + __ATOMIC_RELEASE, __ATOMIC_RELAXED); + FULL_MEM_BARRIER; + return value; } } From erik.osterlund at oracle.com Wed Nov 13 08:38:25 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 13 Nov 2019 09:38:25 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> Message-ID: <722749eb-12f5-16d7-f498-4147a2d32cd9@oracle.com> Hi Andrew, On 2019-11-12 20:00, Andrew Haley wrote: > But my arguments about the C++ intrinsics being well-enough defined, > at least on AArch64 Linux, have not changed, and I'm not going to > argue all that again. I'll grant you that there may well be issues on > various x86 compilers, but that isn't relevant here. I also do not want to revive that discussion at this time. So I'm just going to note the way we think about this is... intrinsically different. With that said, I believe my work here is done. Intrinsic puzzle away. ;-) /Erik From felix.yang at huawei.com Wed Nov 13 08:36:41 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 13 Nov 2019 08:36:41 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> Message-ID: > -----Original Message----- > From: Yangfei (Felix) > Sent: Wednesday, November 13, 2019 10:36 AM > To: 'Andrew Haley' ; Erik ?sterlund > ; Andrew Dinn ; > aarch64-port-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: RE: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > > -----Original Message----- > > From: Andrew Haley [mailto:aph at redhat.com] > > Sent: Wednesday, November 13, 2019 3:00 AM > > To: Erik ?sterlund ; Yangfei (Felix) > > ; Andrew Dinn ; > > aarch64-port-dev at openjdk.java.net > > Cc: hotspot-dev at openjdk.java.net > > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of > > atomic operations > > > > On 11/12/19 5:38 PM, Erik ?sterlund wrote: > > > My hope is that the AArch64 port should use inline assembly as you > > > suggest, > > so we can see that the generated code is correct, as we wait for the > > glorious future where all HotSpot code has been rewritten to work with > > seq_cst (and we are *not* there now). > > > > I don't doubt it. :-) > > > > But my arguments about the C++ intrinsics being well-enough defined, > > at least on AArch64 Linux, have not changed, and I'm not going to argue all > that again. > > I'll grant you that there may well be issues on various x86 compilers, > > but that isn't relevant here. > > Looks like I reignited an old discussion :- ) > > > > Now it looks like you have discovered that we sometimes have double > > > trailing dmb ish, and sometimes lacking leading dmb ish if I am > > > reading this right. That seems to make the case stronger, > > > > Sure, we can use inline asm if there's no other way to do it, but I > > don't think that's necessary. All we need is to use > > > > T res; > > __atomic_exchange(dest, &exchange_value, &res, __ATOMIC_RELEASE); > > FULL_MEM_BARRIER; > > > > When we go the C++ intrinsics way, we should also handle > Atomic::PlatformCmpxchg. > When I compile the following function with GCC 4.9.3: > > long foo(long exchange_value, long volatile* dest, long compare_value) { > long val = __sync_val_compare_and_swap(dest, compare_value, > exchange_value); > return val; > } > > I got: > > .L2: > ldaxr x0, [x1] > cmp x0, x2 > bne .L3 > stlxr w4, x3, [x1] > cbnz w4, .L2 > .L3: > > > Proposed patch: > diff -r 846fee5ea75e > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp > --- a/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 13 > 10:27:06 2019 +0900 > +++ b/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov > +++ 13 10:14:58 2019 +0800 > @@ -52,7 +52,7 @@ > T volatile* > dest, > > atomic_memory_order order) const { > STATIC_ASSERT(byte_size == sizeof(T)); > - T res = __sync_lock_test_and_set(dest, exchange_value); > + T res = __atomic_exchange_n(dest, exchange_value, __ATOMIC_RELEASE); > FULL_MEM_BARRIER; > return res; > } > @@ -70,7 +70,11 @@ > __ATOMIC_RELAXED, > __ATOMIC_RELAXED); > return value; > } else { > - return __sync_val_compare_and_swap(dest, compare_value, > exchange_value); > + T value = compare_value; > + __atomic_compare_exchange(dest, &value, &exchange_value, > /*weak*/false, > + __ATOMIC_RELEASE, > __ATOMIC_RELAXED); > + FULL_MEM_BARRIER; > + return value; > } > } Still not strong enough? considering the first of ldxr of the loop may be speculated. v2 patch: diff -r 846fee5ea75e src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp --- a/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 13 10:27:06 2019 +0900 +++ b/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 13 16:33:16 2019 +0800 @@ -52,7 +52,7 @@ T volatile* dest, atomic_memory_order order) const { STATIC_ASSERT(byte_size == sizeof(T)); - T res = __sync_lock_test_and_set(dest, exchange_value); + T res = __atomic_exchange_n(dest, exchange_value, __ATOMIC_RELEASE); FULL_MEM_BARRIER; return res; } @@ -70,7 +70,12 @@ __ATOMIC_RELAXED, __ATOMIC_RELAXED); return value; } else { - return __sync_val_compare_and_swap(dest, compare_value, exchange_value); + T value = compare_value; + FULL_MEM_BARRIER; + __atomic_compare_exchange(dest, &value, &exchange_value, /*weak*/false, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); + FULL_MEM_BARRIER; + return value; } } From adinn at redhat.com Wed Nov 13 08:55:57 2019 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 13 Nov 2019 08:55:57 +0000 Subject: [aarch64-port-dev ] RFR 8231841: AArch64: Add entry to pns output in help() In-Reply-To: References: Message-ID: On 12/11/2019 18:03, Alan Hayward wrote: > Please could you review this change which adds AArch64 to the pns section of the help() output. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231841 > Webrev: http://cr.openjdk.java.net/~smonteith/8231841/webrev.0/ > > > Built and ran tier1 on x86 and AArch64. Yes, that's good to push thanks. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Wed Nov 13 09:00:21 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 13 Nov 2019 09:00:21 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> Message-ID: <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> On 11/13/19 8:36 AM, Yangfei (Felix) wrote: > Still not strong enough? considering the first of ldxr of the loop may be speculated. Come on now, you must have read the thread on kernel-dev you pointed me to. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Wed Nov 13 09:26:37 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 13 Nov 2019 09:26:37 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Wednesday, November 13, 2019 5:00 PM > To: Yangfei (Felix) ; Erik ?sterlund > ; Andrew Dinn ; > aarch64-port-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > On 11/13/19 8:36 AM, Yangfei (Felix) wrote: > > Still not strong enough? considering the first of ldxr of the loop may be > speculated. > > Come on now, you must have read the thread on kernel-dev you pointed me to. > Yes, the cmpxchg case is different here. So the v2 patch in my previous mail approved? Will create a bug and do necessary testing. Thanks, Felix From patrick at os.amperecomputing.com Wed Nov 13 09:35:36 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Wed, 13 Nov 2019 09:35:36 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: <4a067ad4-a7e8-813e-2633-91bb380f24b5@bell-sw.com> References: <9248d1b3-7341-68b9-7dd1-02dfd48567a6@bell-sw.com> <4a067ad4-a7e8-813e-2633-91bb380f24b5@bell-sw.com> Message-ID: Many thanks for the comments and the suggested updates. I compared the numbers of touched branches, counted from the large-loop condition to the small-loop label. The early sub and branch to small-loop are nice to reduce it from 2 (v3) to 1 (v4), for StrLen=128 case. The base prefetched out of the boundary so the comparison might be unfair. By far my test result on Ampere eMAG systems looks fine with v4, the 128 LL is even a little bit better than base. In theory the additional branch for >= 200 (192 + 8) is still there, if the perf diffs were not obvious, the reason might be: the large-loop takes the majority of execution time, while the branch's time is minor. LL, SoftwarePrefetchHintDistance=192 StrLen=128, base: 2 (prefetch out of boundary, the 1st br condition failed), patch.v3: 2 (to NO_PREFETCH), patch.v4: 1 (to SMALL_LOOP) StrLen=256, base: 2, patch.v3: 3, patch.v4: 3 br + 1 b (SMALL_LOOP_CHECK) StrLen=512, base: 6, patch.v3: 7, patch.v4: 7 br + 1 b (SMALL_LOOP_CHECK) http://cr.openjdk.java.net/~qpzhang/8229351/webrev.04 The additional b (SMALL_LOOP_CHECK) makes the code cleaner, but want to keep the original subs and br (LT, TAIL), so keep as-is in v4. Tested jtreg and strcmp microbenchs for LL/UU as smoke tests, no obvious regression. The LU/UL and other parts were not changed in v4, previous tests can cover. Regards Patrick -----Original Message----- From: Aleksei Voitylov Sent: Wednesday, November 13, 2019 12:10 AM To: Patrick Zhang OS Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable Hi Patrick, First, I'm a trespasser, not a reviewer. Reviewers will need to look at this. On the technical side: This additional branch in v3 is still painful. You can reduce the amount of branches in code path for lengths 128 and 256 by using that fact that CompareLongStringLimitLatin and CompareLongStringLimitUTF are at least 24. Then we don't have to jump to NO_PREFETCH label, where check for small string size is done. Instead we can jump to SMALL_LOOP label (assuming cnt2 counter is updated accordingly). In this case NO_PREFETCH label is not needed and we have 1 less branch. Rough sketch of affected part based on v3 looks as follows. This version was checked on ThunderX2 and it looks fine on length 128 and 256 perf-wise. I also added alignment for small loop, which also helps a bit. Please keep in mind it's a sketch. -Aleksei @@ -4172,19 +4168,34 @@ ???? Register result = r0, str1 = r1, cnt1 = r2, str2 = r3, cnt2 = r4, ???????? tmp1 = r10, tmp2 = r11; ???? Label SMALL_LOOP, LARGE_LOOP_PREFETCH, CHECK_LAST, DIFF2, TAIL, -??????? LENGTH_DIFF, DIFF, LAST_CHECK_AND_LENGTH_DIFF, +??????? LENGTH_DIFF, DIFF, LAST_CHECK_AND_LENGTH_DIFF, +SMALL_LOOP_CHECK, ???????? DIFF_LAST_POSITION, DIFF_LAST_POSITION2; ???? // exit from large loop when less than 64 bytes left to read or we're about ???? // to prefetch memory behind array border ???? int largeLoopExitCondition = MAX(64, SoftwarePrefetchHintDistance)/(isLL ? 1 : 2); +??? // calculate the remaining limit in chars which manages if this stub should be called, +??? // if the limit is large enough (>= largeLoopExitCondition), below large loop with prefetching +??? // can be executed at least once, and there is no need to do any extra checking at the entrance. +??? int remainingLimit = (isLL ? CompareLongStringLimitLatin : CompareLongStringLimitUTF) - +???????????????????????? (wordSize / (isLL ? 1 : 2)); ???? // cnt1/cnt2 contains amount of characters to compare. cnt1 can be re-used ???? // update cnt2 counter with already loaded 8 bytes -??? __ sub(cnt2, cnt2, wordSize/(isLL ? 1 : 2)); +??? if (SoftwarePrefetchHintDistance >= 0 && remainingLimit < largeLoopExitCondition) { +????? __ sub(cnt2, cnt2, isLL ? 24 : 12); +??? } else { +????? __ sub(cnt2, cnt2, isLL ? 8 : 4); +??? } ???? // update pointers, because of previous read ???? __ add(str1, str1, wordSize); ???? __ add(str2, str2, wordSize); ???? if (SoftwarePrefetchHintDistance >= 0) { -????? __ bind(LARGE_LOOP_PREFETCH); +????? if (remainingLimit < largeLoopExitCondition) { +??????? // there could be fewer bytes left and invalid for this large loop with prefetching +??????? __ subs(rscratch2, cnt2, largeLoopExitCondition - (isLL ? 16 : 8)); +??????? __ br(__ LT, SMALL_LOOP); +??????? __ add(cnt2, cnt2, isLL ? 16 : 8); +????? } +????? __ bind(LARGE_LOOP_PREFETCH); // 64 bytes loop ???????? __ prfm(Address(str1, SoftwarePrefetchHintDistance)); ???????? __ prfm(Address(str2, SoftwarePrefetchHintDistance)); ???????? compare_string_16_bytes_same(DIFF, DIFF2); @@ -4196,11 +4207,11 @@ ???????? __ br(__ GT, LARGE_LOOP_PREFETCH); ???????? __ cbz(cnt2, LAST_CHECK_AND_LENGTH_DIFF); // no more chars left? ???? } -??? // less than 16 bytes left? -??? __ subs(cnt2, cnt2, isLL ? 16 : 8); -??? __ br(__ LT, TAIL); +??? __ b(SMALL_LOOP_CHECK); // check if less than 16 bytes left +??? __ align(OptoLoopAlignment); ???? __ bind(SMALL_LOOP); ?????? compare_string_16_bytes_same(DIFF, DIFF2); +????? __ bind(SMALL_LOOP_CHECK); ?????? __ subs(cnt2, cnt2, isLL ? 16 : 8); ?????? __ br(__ GE, SMALL_LOOP); ???? __ bind(TAIL); On 12/11/2019 12:52, Patrick Zhang OS wrote: > Ping... > > Hi Aleksei, > > Does the potential regression on my test system still exist? with the new patch webrev.03? If the added largeLoopExitCondition condition excluded your >128 chars strings from the large loop, and still caused performance drops, maybe hacking all the software prefetch hint distance and the checking condition to hardcoded 64, can be a good try. Although I think it would not be a right thing to do, in comparison with the similar logic in generate_compare_long_string_different_encoding. Thanks. > > http://cr.openjdk.java.net/~qpzhang/8229351/webrev.03/jdk.changeset > if (SoftwarePrefetchHintDistance >= 0) { > - __ bind(LARGE_LOOP_PREFETCH); > + if (remainingLimit < largeLoopExitCondition) { > + // there could be fewer bytes left and invalid for this large loop with prefetching > + __ subs(rscratch2, cnt2, largeLoopExitCondition); // => subs(rscratch2, cnt2, 64); ?? > + __ br(__ LT, NO_PREFETCH); > + } > + __ bind(LARGE_LOOP_PREFETCH); // 64 bytes loop > __ prfm(Address(str1, SoftwarePrefetchHintDistance)); // => __ prfm(Address(str1, 64)); > __ prfm(Address(str2, SoftwarePrefetchHintDistance)); // => > __ prfm(Address(str2, 64)); > > Regards > Patrick > > -----Original Message----- > From: aarch64-port-dev On > Behalf Of Patrick Zhang OS > Sent: Thursday, November 7, 2019 6:56 PM > To: Aleksei Voitylov > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub > threshold of string_compare intrinsic tunable > > Hi Aleksei, > > Thanks for testing it and the data. I only had the source of StringCompareBench.java [6], my numbers (the diffs) are within 2%, while the -207.12% looks quite weird. I initially did not add the condition to control the br, since generate_ compare_long_string_different_encoding has the similar unconditional br. By the way, the original logic allowed prefetching the memory behind array border, for the first 64 bytes. I think securing the prefetch is the right thing to do, but it could certainly stop some cases from going to the large loop with prefetching. Welcome further comments, thanks. > > http://cr.openjdk.java.net/~qpzhang/8229351/webrev.03/ > > Regards > Patrick > > From: Aleksei Voitylov > Sent: Thursday, November 7, 2019 12:53 AM > To: Patrick Zhang OS > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub > threshold of string_compare intrinsic tunable > > > Hi Patrick, > > I like the fact that this patch does not add much to the complexity of the code. Here are some experiments that you could find useful. > Cortex A73 Size base (ns/op) patched (ns/op) Diff > StringCompareBench.StringCompareLL 256 14422257,98 15302300,24 -6,10% > StringCompareBench.StringCompareLL 512 27998036,21 28317818,08 -1,14% > > ThunderX2 Size base (ns/op) patched (ns/op) Diff > StringCompareBench.StringCompareLL 128 4265122,232 13099099,67 -207,12% > StringCompareBench.StringCompareLL 256 3539452,533 3599407,432 -1,69% > > StringCompareBench.StringCompareUU 128 6899938,75 7174601,241 -3,98% > StringCompareBench.StringCompareUU 256 7654538,841 7826599,466 -2,25% > > StringCompareBench.cachedStringCompareLL 128 19,673 21,242 -7,98% > StringCompareBench.cachedStringCompareLL 256 34,179 36,452 -6,65% > StringCompareBench.cachedStringCompareLL 512 59,574 64,088 -7,58% > StringCompareBench.cachedStringCompareLL 1024 110,37 118,477 -7,35% > StringCompareBench.cachedStringCompareLL 1000000 114028,907 115388,681 -1,19% > > StringCompareBench.cachedStringCompareUU 128 33,752 36,922 -9,39% > StringCompareBench.cachedStringCompareUU 256 60,939 64,096 -5,18% > StringCompareBench.cachedStringCompareUU 512 115,328 118,48 -2,73% > StringCompareBench.cachedStringCompareUU 1024 239,332 242,97 -1,52% > StringCompareBench.cachedStringCompareUU 1000000 226491,096 233638,328 -3,16% > It might be the case that the newly added branch is the culprit: > > + __ subs(rscratch2, cnt2, largeLoopExitCondition); > + __ br(__ LT, NO_PREFETCH); > > Maybe you could skip it when CompareLongStringLimitLatin and CompareLongStringLimitUTF are large enough (then stub code is only called with string length large enough to skip branch above). Then (the properly commented) code would look like: > > if ((stub_threshold-wordSize/(isLL ? 1 : 2)) < largeLoopExitCondition) { > __ subs(rscratch2, cnt2, largeLoopExitCondition); > __ br(__ LT, NO_PREFETCH); > } > > and in this case we shouldn't see any performance penalties. > > -Aleksei > > On 29/10/2019 12:58, Patrick Zhang OS wrote: > > Hi, > > > > Could you please review this patch, thanks. > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 > > Webrev: http://cr.openjdk.java.net/~qpzhang/8229351/webrev.02 > > (this starts from .02 since there had been some internal review and > updates) > > > > Changes: > > > > 1. Split the STUB_THRESHOLD from the hard-coded 72 to be CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more flexible control over the stub thresholds for string_compare intrinsics, especially for various uArchs. > > > > 2. MacroAssembler::string_compare LL and UU shared the same threshold, actually UU may only require the half (length of chars) of that of LL's, because one character has two-bytes for UU, while for compacted LL strings, one character means one byte. In addition, LU/UL may need a separated threshold, as the stub function is different from the same encoding one, and the performance may vary as well. > > > > 3. In generate_compare_long_string_same_encoding, the hard-coded 72 was originally able to ensure that there can be always 64 bytes at least for the prefetch code path. However once a smaller stub threshold is set, a new condition is needed to tell if this would be still valid, or has to go to the NO_PREFETCH branch. This change can ensure the correctness. > > > > 4. In generate_compare_long_string_different_encoding, some temp vars for handling the last 4 characters are not valid any longer, cleaned up strU and strL, and related pointers initialization to the next U (cnt1) and L (tmp2). > > > > 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not needed, as tmpU or tmpL point to the same register. > > > > Tests: > > > > 1. For function check, I have run > > > > jdk jtreg tier1 tests, with default vm flags > > > > hotspot jtreg tests: runtime/compiler/gc parts, with "-Xcomp -XX:-TieredCompilation" > > > > jck10/api/java.lang 1609 cases and other selected modules, no new > failures found, with default vm flags and "-Xcomp > -XX:-TieredCompilation" respectively; > > > > some specific test cases had been carefully executed to double check, i.e., TestStringCompareToDifferentLength.java [1] and TestStringCompareToDifferentLength.java [1] introduced by [2], StrCmpTest.java [3] introduced by [4]. > > > > 1. For performance check, I have run > > > > string-density-bench/CompareToBench.java [5] and > StringCompareBench.java [6] respectively, > > > > and SPECjbb2015.jar, no obvious performance change has been found (since the default threshold is NOT changed within this patch). > > > > FYI. with Ampere eMAG system, microbenchmarks [5][6] can have 1.5x consistent perf gain with LU/UL comparison for shorter strings (<72 chars, smaller stub thresholds), and slight improvement (5~10%) with LL/UU cases. > > > > Refs: > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtre > g/compiler/intrinsics/string > > [2] https://bugs.openjdk.java.net/browse/JDK-8218966 AArch64: > String.compareTo() can read memory after string > > [3] http://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java, > contributed by Dmitrij Pochepko > > [4] https://bugs.openjdk.java.net/browse/JDK-8202326 AARCH64: optimize > string compare intrinsic > > [5] > http://cr.openjdk.java.net/~shade/density/string-density-bench.jar, > contributed by Aleksey Shipilev > > [6] > http://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java, > contributed by Dmitrij Pochepko > > > > Regards > > Patrick > > From aph at redhat.com Wed Nov 13 09:39:05 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 13 Nov 2019 09:39:05 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> Message-ID: <9aa92f57-ea8a-7e0a-2218-bf360a13c46e@redhat.com> On 11/13/19 9:26 AM, Yangfei (Felix) wrote: > Yes, the cmpxchg case is different here. > So the v2 patch in my previous mail approved? > Will create a bug and do necessary testing. I don't know which patch is v2, but for the reasons carefully laid out in the kernel-dev thread we don't need two full barriers. The first version of Atomic::PlatformCmpxchg you posted is OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Wed Nov 13 09:46:55 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 13 Nov 2019 09:46:55 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: <9aa92f57-ea8a-7e0a-2218-bf360a13c46e@redhat.com> References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> <9aa92f57-ea8a-7e0a-2218-bf360a13c46e@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Wednesday, November 13, 2019 5:39 PM > To: Yangfei (Felix) ; Erik ?sterlund > ; Andrew Dinn ; > aarch64-port-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic > operations > > On 11/13/19 9:26 AM, Yangfei (Felix) wrote: > > Yes, the cmpxchg case is different here. > > So the v2 patch in my previous mail approved? > > Will create a bug and do necessary testing. > > I don't know which patch is v2, but for the reasons carefully laid out in the > kernel-dev thread we don't need two full barriers. The first version of > Atomic::PlatformCmpxchg you posted is OK. > Well, I think the cmpxchg case is different: the compare in the loop may fail and then we don't got a change to execute the stlxr instruction. This is explicitedly discussed in that thread: https://patchwork.kernel.org/patch/3575821/ As a result, aarch64 Linux plants two barriers in that patch: @@ -112,17 +114,20 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new) unsigned long tmp; int oldval; + smp_mb(); < ======== + asm volatile("// atomic_cmpxchg\n" -"1: ldaxr %w1, %2\n" +"1: ldxr %w1, %2\n" " cmp %w1, %w3\n" " b.ne 2f\n" -" stlxr %w0, %w4, %2\n" +" stxr %w0, %w4, %2\n" " cbnz %w0, 1b\n" "2:" : "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter) : "Ir" (old), "r" (new) : "cc", "memory"); + smp_mb(); < ======== return oldval; } That's why I switched to the V2 patch: diff -r 846fee5ea75e src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp --- a/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 13 10:27:06 2019 +0900 +++ b/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp Wed Nov 13 16:33:16 2019 +0800 @@ -52,7 +52,7 @@ T volatile* dest, atomic_memory_order order) const { STATIC_ASSERT(byte_size == sizeof(T)); - T res = __sync_lock_test_and_set(dest, exchange_value); + T res = __atomic_exchange_n(dest, exchange_value, __ATOMIC_RELEASE); FULL_MEM_BARRIER; return res; } @@ -70,7 +70,12 @@ __ATOMIC_RELAXED, __ATOMIC_RELAXED); return value; } else { - return __sync_val_compare_and_swap(dest, compare_value, exchange_value); + T value = compare_value; + FULL_MEM_BARRIER; + __atomic_compare_exchange(dest, &value, &exchange_value, /*weak*/false, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); + FULL_MEM_BARRIER; + return value; } } From Pengfei.Li at arm.com Wed Nov 13 09:55:48 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Wed, 13 Nov 2019 09:55:48 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable Message-ID: Hi, JBS: https://bugs.openjdk.java.net/browse/JDK-8233743 Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.00/ This is a follow-up patch of JDK-8217909[1] to make the AArch64 register r27 allocatable when CompressedOops and CompressedClassPointers are both turned off. Below changes have been made: - Massage the RegMask(s) in reg_mask_init() at C2 initialization and remove r27 from some of the masks conditionally to make it allocatable. - Also make r29 conditionally reserved in this similar way. - Make r29 allocatable for pointers as well as integers. - Replace an rheapbase use to rscratch1 in AArch64 ZGC. - Revert JDK-8231754[2] which makes r27 always reserved in JVMCI. This patch aligns with the implementation in [1] which makes the x86_64 r12 register allocatable. Please let me know if I have missed anything for AArch64. Tests: Full jtreg with default options and extra options "-XX:-UseCompressedOops -XX:+PreserveFramePointer". No new failure is found. [1] https://hg.openjdk.java.net/jdk/jdk/rev/48b50573dee4 [2] https://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de -- Thanks, Pengfei From aph at redhat.com Wed Nov 13 10:38:25 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 13 Nov 2019 10:38:25 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> <9aa92f57-ea8a-7e0a-2218-bf360a13c46e@redhat.com> Message-ID: <1345fade-e1b4-42f0-c86f-9fd518431fcf@redhat.com> On 11/13/19 9:46 AM, Yangfei (Felix) wrote: > That's why I switched to the V2 patch: I see. This seems excessive. I doubt that there is any code in HotSpot that relies on such things, especially given that we've manage with mere sequential consistency for CMPXCHG for so long, but if you want to go for the full Howitzer I won't try to stop you. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Nov 13 12:27:26 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 13 Nov 2019 12:27:26 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: References: Message-ID: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> On 10/29/19 9:58 AM, Patrick Zhang OS wrote: > 1. Split the STUB_THRESHOLD from the hard-coded 72 to be > CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more > flexible control over the stub thresholds for string_compare > intrinsics, especially for various uArchs. > > 2. MacroAssembler::string_compare LL and UU shared the same > threshold, actually UU may only require the half (length of chars) > of that of LL's, because one character has two-bytes for UU, while > for compacted LL strings, one character means one byte. In addition, > LU/UL may need a separated threshold, as the stub function is > different from the same encoding one, and the performance may vary > as well. > > 3. In generate_compare_long_string_same_encoding, the hard-coded 72 > was originally able to ensure that there can be always 64 bytes at > least for the prefetch code path. However once a smaller stub > threshold is set, a new condition is needed to tell if this would be > still valid, or has to go to the NO_PREFETCH branch. This change can > ensure the correctness. > > 4. In generate_compare_long_string_different_encoding, some temp > vars for handling the last 4 characters are not valid any longer, > cleaned up strU and strL, and related pointers initialization to the > next U (cnt1) and L (tmp2). > > 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not > needed, as tmpU or tmpL point to the same register. Thank you for your patch, but I'm afraid that I have some reservations. This patch seems to do rather a lot. What are the thresholds you tested? How are we supposed to test with these different thresholds? Are the thresholds bytes or characters? Why are the different thresholds not tested in this patch? But the more serious problem is the fact that we have different code paths for different microarchitectures, and somehow this has to be standard supportable software. In order to test this stuff we'll need different test parameters for SoftwarePrefetchHintDistance, CompareLongStringLimitLatin, CompareLongStringLimitUTF. Bear in mind that while manufacturers are (entirely reasonably) very keen to show their processors in the best light possible, they are not the people who will have to support this software and debug it when it goes wrong. So there is a fundamental conflict of interest between support people and CPU vendors. We already emit a great deal of in-line code in the string_compare intrinsic, with the intention that this be as fast as possible because we want to avoid having to call the intrinsic. So why is the intrinsic actually faster in your case? Could we not concentrate on that? I -- and I'm sure it's not just me -- would be tremendously grateful if all of the AArch64 developers would concentrate on improving code quality overall rather than tweaking stub parameters. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From daniel.daugherty at oracle.com Wed Nov 13 14:45:36 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 13 Nov 2019 09:45:36 -0500 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> Message-ID: On 11/13/19 4:26 AM, Yangfei (Felix) wrote: >> -----Original Message----- >> From: Andrew Haley [mailto:aph at redhat.com] >> Sent: Wednesday, November 13, 2019 5:00 PM >> To: Yangfei (Felix) ; Erik ?sterlund >> ; Andrew Dinn ; >> aarch64-port-dev at openjdk.java.net >> Cc: hotspot-dev at openjdk.java.net >> Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic >> operations >> >> On 11/13/19 8:36 AM, Yangfei (Felix) wrote: >>> Still not strong enough? considering the first of ldxr of the loop may be >> speculated. >> >> Come on now, you must have read the thread on kernel-dev you pointed me to. >> > Yes, the cmpxchg case is different here. > So the v2 patch in my previous mail approved? > Will create a bug and do necessary testing. Is there a reason to not reopen this bug: JDK-8233912 aarch64: minor improvements of atomic operations https://bugs.openjdk.java.net/browse/JDK-8233912 Dan > > Thanks, > Felix From ci_notify at linaro.org Thu Nov 14 02:36:08 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Thu, 14 Nov 2019 02:36:08 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <851383565.643.1573698968979.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/317/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/09 pass: 5,747; fail: 1 Build 1: aarch64/2019/oct/11 pass: 5,751; fail: 1 Build 2: aarch64/2019/oct/14 pass: 5,753 Build 3: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 4: aarch64/2019/oct/18 pass: 5,760 Build 5: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 6: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 7: aarch64/2019/oct/28 pass: 5,766 Build 8: aarch64/2019/oct/30 pass: 5,768 Build 9: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 10: aarch64/2019/nov/04 pass: 5,769 Build 11: aarch64/2019/nov/06 pass: 5,766; fail: 2 Build 12: aarch64/2019/nov/08 pass: 5,761 Build 13: aarch64/2019/nov/11 pass: 5,762 Build 14: aarch64/2019/nov/13 pass: 5,764; fail: 1 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/09 pass: 8,692; fail: 507; error: 21 Build 1: aarch64/2019/oct/11 pass: 8,693; fail: 511; error: 18 Build 2: aarch64/2019/oct/14 pass: 8,706; fail: 497; error: 20 Build 3: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 4: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 5: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 6: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 7: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 8: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 9: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 10: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 Build 11: aarch64/2019/nov/06 pass: 8,775; fail: 507; error: 19 Build 12: aarch64/2019/nov/08 pass: 8,774; fail: 510; error: 17 Build 13: aarch64/2019/nov/11 pass: 8,777; fail: 509; error: 15 Build 14: aarch64/2019/nov/13 pass: 8,773; fail: 509; error: 21 3 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/09 pass: 3,979 Build 1: aarch64/2019/oct/11 pass: 3,979 Build 2: aarch64/2019/oct/14 pass: 3,979 Build 3: aarch64/2019/oct/16 pass: 3,979 Build 4: aarch64/2019/oct/18 pass: 3,979 Build 5: aarch64/2019/oct/21 pass: 3,979 Build 6: aarch64/2019/oct/23 pass: 3,980 Build 7: aarch64/2019/oct/28 pass: 3,980 Build 8: aarch64/2019/oct/30 pass: 3,980 Build 9: aarch64/2019/nov/01 pass: 3,980 Build 10: aarch64/2019/nov/04 pass: 3,980 Build 11: aarch64/2019/nov/06 pass: 3,980 Build 12: aarch64/2019/nov/08 pass: 3,980 Build 13: aarch64/2019/nov/11 pass: 3,980 Build 14: aarch64/2019/nov/13 pass: 3,980 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.63x Relative performance: Server critical-jOPS (nc): 9.66x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 207.57 Server 207.57 / Server 2014-04-01 (71.00): 2.92x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-10-08 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/280/results/ 2019-10-10 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/282/results/ 2019-10-12 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/284/results/ 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ 2019-11-07 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/310/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/315/results/ 2019-11-14 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/317/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From dean.long at oracle.com Thu Nov 14 04:15:40 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 13 Nov 2019 20:15:40 -0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: Message-ID: Hi Pengfei, I took a quick look and didn't notice any problems.? Nice work! This seems to match the x64 approach, however please get other reviews. dl On 11/13/19 1:55 AM, Pengfei Li (Arm Technology China) wrote: > Hi, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8233743 > Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.00/ > > This is a follow-up patch of JDK-8217909[1] to make the AArch64 register > r27 allocatable when CompressedOops and CompressedClassPointers are both > turned off. > > Below changes have been made: > - Massage the RegMask(s) in reg_mask_init() at C2 initialization and > remove r27 from some of the masks conditionally to make it allocatable. > - Also make r29 conditionally reserved in this similar way. > - Make r29 allocatable for pointers as well as integers. > - Replace an rheapbase use to rscratch1 in AArch64 ZGC. > - Revert JDK-8231754[2] which makes r27 always reserved in JVMCI. > > This patch aligns with the implementation in [1] which makes the x86_64 > r12 register allocatable. Please let me know if I have missed anything > for AArch64. > > Tests: > Full jtreg with default options and extra options "-XX:-UseCompressedOops > -XX:+PreserveFramePointer". No new failure is found. > > [1] https://hg.openjdk.java.net/jdk/jdk/rev/48b50573dee4 > [2] https://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de > > -- > Thanks, > Pengfei > From patrick at os.amperecomputing.com Thu Nov 14 09:20:01 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Thu, 14 Nov 2019 09:20:01 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> References: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> Message-ID: Thanks for the comments, see my answers below please. >> 1. This patch seems to do rather a lot. Yes, it enables tweaking the stub parameters (not really changed any in this patch), fixed an out-of-boundary prefetching for LL/UU, and fixed some redundant instructions in LU/UL code path. The latter two are code-quality-wise, if splitting the patch could make the changes clearer, I'd like to do. >> 2. Are the thresholds bytes or characters? All thresholds are (and should be) in characters. This was a little bit misleading, for LL/LU/UL, the const STUB_THRESHOLD meant chars, while for UU it could be explained as bytes. If specified -XX:-CompactStrings, all code path going to UU would make the threshold mean bytes, which might confuse developers. This patch can clarify it, and the description of tunable options can provide further guidance. >> 3. How are we supposed to test with these different thresholds? There are two jtreg tests for checking the impacts of SoftwarePrefetchHintDistance over the intrinsics, I have locally added non-default thresholds inside and tested with many lengths (took days on a test system). This has not been included in the proposed patch, maybe a follow-up one would do, any advice? hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToSameLength.java hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java >> 4. What are the thresholds you tested? Firstly, the default threshold, the hardcoded 72 is my testing focus since I would try best not to bring negative impacts to aarch64-port normal state, especially other CPU vendors. Second, I tested two extreme thresholds: 24 and 255, which means more shorter strings (24 to 71 chars) or only very long strings (>=255) could go to the stub code path, respectively. Function tests passed (listed in the initial email), while performance test results (with string-density-bench, StringCompareBench.java, and SPECjbb2015) could be varying with different systems (as well as microarchitectures). Third, some other non-default thresholds, as sanity check, particularly for ensuring correctness. >> 5. But the more serious problem is the fact that we have different code paths for different microarchitectures, and somehow this has to be standard supportable software. In order to test this stuff we'll need different test parameters for SoftwarePrefetchHintDistance, CompareLongStringLimitLatin, CompareLongStringLimitUTF The STUB_THRESHOLD was introduced to control the stub code insertion, tested on some aarch64 systems. I think making it tunable is the way to let different microarchitectures be able to configure optimal ones for their own. I would like to have a common threshold too, or no threshold for all, but lacking of full-coverage tests over all systems. Maybe I misunderstood you points here with regards to "supportable", the two new options can be kept as default if developers have no concerns on string compare intrinsics. >> 6. We already emit a great deal of in-line code in the string_compare intrinsic, with the intention that this be as fast as possible because we want to avoid having to call the intrinsic. So why is the intrinsic actually faster in your case? Avoid having to call the intrinsic? Per my testing results with microbenchmarks like string-density-bench.jar, the LL cases can be up to 10x faster than the non-intrinsic path, while for some public benchmarks with SPECjbb, Renaissance, 99% string_compare inside are LL, the intrinsics definitely can help a lot as well. If you did NOT mean completely "avoiding intrinsic", but the strings shorter than 72 chars, I would have to say, "it depends". The stub functions try best to process every 16 chars, while the outer logic processes every 8 bytes, which is the major diff. For example, I can see consistent 1.5x faster with lengths 24-71 for LU/UL cases, maybe others cannot, which can be reason why we need an option here. Regards Patrick -----Original Message----- From: Andrew Haley Sent: Wednesday, November 13, 2019 8:27 PM To: Patrick Zhang OS ; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable On 10/29/19 9:58 AM, Patrick Zhang OS wrote: > 1. Split the STUB_THRESHOLD from the hard-coded 72 to be > CompareLongStringLimitLatin and CompareLongStringLimitUTF as a more > flexible control over the stub thresholds for string_compare > intrinsics, especially for various uArchs. > > 2. MacroAssembler::string_compare LL and UU shared the same > threshold, actually UU may only require the half (length of chars) of > that of LL's, because one character has two-bytes for UU, while for > compacted LL strings, one character means one byte. In addition, LU/UL > may need a separated threshold, as the stub function is different from > the same encoding one, and the performance may vary as well. > > 3. In generate_compare_long_string_same_encoding, the hard-coded 72 > was originally able to ensure that there can be always 64 bytes at > least for the prefetch code path. However once a smaller stub > threshold is set, a new condition is needed to tell if this would be > still valid, or has to go to the NO_PREFETCH branch. This change can > ensure the correctness. > > 4. In generate_compare_long_string_different_encoding, some temp vars > for handling the last 4 characters are not valid any longer, cleaned > up strU and strL, and related pointers initialization to the next U > (cnt1) and L (tmp2). > > 5. In compare_string_16_x_LU, the reference to r10 (tmp1) is not > needed, as tmpU or tmpL point to the same register. Thank you for your patch, but I'm afraid that I have some reservations. This patch seems to do rather a lot. What are the thresholds you tested? How are we supposed to test with these different thresholds? Are the thresholds bytes or characters? Why are the different thresholds not tested in this patch? But the more serious problem is the fact that we have different code paths for different microarchitectures, and somehow this has to be standard supportable software. In order to test this stuff we'll need different test parameters for SoftwarePrefetchHintDistance, CompareLongStringLimitLatin, CompareLongStringLimitUTF. Bear in mind that while manufacturers are (entirely reasonably) very keen to show their processors in the best light possible, they are not the people who will have to support this software and debug it when it goes wrong. So there is a fundamental conflict of interest between support people and CPU vendors. We already emit a great deal of in-line code in the string_compare intrinsic, with the intention that this be as fast as possible because we want to avoid having to call the intrinsic. So why is the intrinsic actually faster in your case? Could we not concentrate on that? I -- and I'm sure it's not just me -- would be tremendously grateful if all of the AArch64 developers would concentrate on improving code quality overall rather than tweaking stub parameters. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Thu Nov 14 09:26:45 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 14 Nov 2019 09:26:45 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> References: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> Message-ID: On 13/11/2019 12:27, Andrew Haley wrote: > Thank you for your patch, but I'm afraid that I have some reservations. I also have the same reservations. > This patch seems to do rather a lot. > > What are the thresholds you tested? How are we supposed to test with > these different thresholds? Are the thresholds bytes or characters? > Why are the different thresholds not tested in this patch? I agree that we would really need some numbers in order to determine whether to make this change. However, before we go down that path ... > But the more serious problem is the fact that we have different code > paths for different microarchitectures, and somehow this has to be > standard supportable software. In order to test this stuff we'll need > different test parameters for SoftwarePrefetchHintDistance, > CompareLongStringLimitLatin, CompareLongStringLimitUTF. The key word here is /supportable/. This current proposed change is the start of a slippery slope where we can end up with a plethora of 'tuning' parameters, not just for different manufacturers' but for this years model and then next years model and so on. As the number and, more importantly, combination of such parameters grows we can easily end up in a situation where we are unable to generate a useful configuration for all combinations of tuning parameters that meet the totality of different application needs. Worse, we risk ending up in a situation where we see terrible performance in the worst cases and no idea of how we got there. This is a problem of complexity and tractability. Even if we could in principle, given enough time, arrive at a global maximum or, failing that, an optimal compromise that trades off competing needs the danger is that in practice getting there can end up taking increasingly large amounts of development and maintenance time that we don't have. So, the gains for any addition of tuning parameters need to be significant if we are to justify the costs incurred by implementing and maintaining them. It is not enough for such a tuning feature to optimize a specific case, especially just for a specific architecture, by a noticeable amount e.g. the 1.5x that you cite for your architecture. For an improvement to be significant enough to merit the incurred support burden the gain ought to apply to many applications or, perhaps, to a few critical, high-value applications and needs manifestly not to risk lowering performance in all other applications. It also ought, at the least, to be shown not to hurt performance on other architectures and, preferably, provide at least some other architectures with the opportunity also to improve performance. > I -- and I'm sure it's not just me -- would be tremendously grateful > if all of the AArch64 developers would concentrate on improving code > quality overall rather than tweaking stub parameters. I can confirm that this is not just Andrew's sentiment. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Thu Nov 14 10:33:00 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 14 Nov 2019 10:33:00 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: References: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> Message-ID: <2890e2c6-ed15-7f8a-5e88-83ed8c6af7ce@redhat.com> On 11/14/19 9:20 AM, Patrick Zhang OS wrote: > Thanks for the comments, see my answers below please. > >>> 1. This patch seems to do rather a lot. > Yes, it enables tweaking the stub parameters (not really changed any > in this patch), fixed an out-of-boundary prefetching for LL/UU, and > fixed some redundant instructions in LU/UL code path. The latter > two are code-quality-wise, if splitting the patch could make the > changes clearer, I'd like to do. Why do we care about out-of-boundary prefetching for LL/UU? I don't think we do if it requires any extra logic. >>> 2. Are the thresholds bytes or characters? > All thresholds are (and should be) in characters. This was a little > bit misleading, for LL/LU/UL, the const STUB_THRESHOLD meant chars, > while for UU it could be explained as bytes. If specified > -XX:-CompactStrings, all code path going to UU would make the > threshold mean bytes, which might confuse developers. This patch can > clarify it, and the description of tunable options can provide > further guidance. It must. Without some commentary both maintainers and developers are lost. Unless there is some very strong reason, all counts must specify units. >>> 3. How are we supposed to test with these different thresholds? > There are two jtreg tests for checking the impacts of > SoftwarePrefetchHintDistance over the intrinsics, I have locally > added non-default thresholds inside and tested with many lengths > (took days on a test system). This has not been included in the > proposed patch, maybe a follow-up one would do, any advice? > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToSameLength.java > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java I won't accept this patch unless it is accompanied by test cases that properly exercise the code. >>> 4. What are the thresholds you tested? > Firstly, the default threshold, the hardcoded 72 is my testing focus > since I would try best not to bring negative impacts to aarch64-port > normal state, especially other CPU vendors. > Second, I tested two extreme thresholds: 24 and 255, which means > more shorter strings (24 to 71 chars) or only very long strings > (>=255) could go to the stub code path, respectively. Function tests > passed (listed in the initial email), while performance test results > (with string-density-bench, StringCompareBench.java, and > SPECjbb2015) could be varying with different systems (as well as > microarchitectures). > Third, some other non-default thresholds, as sanity check, > particularly for ensuring correctness. It's the extremes that really matter, I suspect. >>> 5. But the more serious problem is the fact that we have different >>> code paths for different microarchitectures, and somehow this has >>> to be standard supportable software. In order to test this stuff >>> we'll need different test parameters for >>> SoftwarePrefetchHintDistance, CompareLongStringLimitLatin, >>> CompareLongStringLimitUTF > The STUB_THRESHOLD was introduced to control the stub code > insertion, tested on some aarch64 systems. I think making it tunable > is the way to let different microarchitectures be able to configure > optimal ones for their own. Well, yes. The question is whether we go down this rabbit hole or try to find a compromise that is perhaps not quite optimal for anyone but good enough for everyone. > I would like to have a common threshold too, or no threshold for > all, but lacking of full-coverage tests over all systems. Maybe I > misunderstood you points here with regards to "supportable", the two > new options can be kept as default if developers have no concerns on > string compare intrinsics. I rather suspect that vendors will want to change the defaults sooner or later. And besides, we'll all have to support these options. >>> 6. We already emit a great deal of in-line code in the >>> string_compare intrinsic, with the intention that this be as fast >>> as possible because we want to avoid having to call the >>> intrinsic. So why is the intrinsic actually faster in your case? > Avoid having to call the intrinsic? I meant "the stub". > If you did NOT mean completely "avoiding intrinsic", but the strings > shorter than 72 chars, I would have to say, "it depends". The stub > functions try best to process every 16 chars, while the outer logic > processes every 8 bytes, which is the major diff. For example, I can > see consistent 1.5x faster with lengths 24-71 for LU/UL cases, maybe > others cannot, which can be reason why we need an option here. I know that strings of length 24 - 30ish are very common, so this is an important case. Do you have a theory that LU/UL cases are common? Why? What is it like with LL/UU? I'd need to see real timings. I'd either do all numbers < 256 or (to save time) a sequence like... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 19, 21, 23, 25, 28, 30, 34, 37, 41, 45, 49, 54, 60, 66, 72, 80, 88, 97, 106, 117, 129, 142, 156, 171, 189, 207, 228, 251 The idea here is that we an plot a graph. The timings should ideally be monotonically increasing. And then we could see how different processors behave, and hopefully find a decent solution for all. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Nov 14 10:40:53 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 14 Nov 2019 10:40:53 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: Message-ID: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> On 11/13/19 9:55 AM, Pengfei Li (Arm Technology China) wrote: > This patch aligns with the implementation in [1] which makes the x86_64 > r12 register allocatable. Please let me know if I have missed anything > for AArch64. We don't generally use r27 for compressed class pointers. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From patrick at os.amperecomputing.com Thu Nov 14 11:13:48 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Thu, 14 Nov 2019 11:13:48 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: <2890e2c6-ed15-7f8a-5e88-83ed8c6af7ce@redhat.com> References: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> <2890e2c6-ed15-7f8a-5e88-83ed8c6af7ce@redhat.com> Message-ID: >> Why do we care about out-of-boundary prefetching for LL/UU? I don't think we do if it requires any extra logic. I was thinking out-of-boundary prefetching should be prevented, and UL/LU has the same condition, if no need, we could force set largeLoopExitCondition to be 64 so that more cases can freely stay in the large loop. I don't think so. http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#l4194 if (SoftwarePrefetchHintDistance >= 0) { __ bind(LARGE_LOOP_PREFETCH); __ prfm(Address(str1, SoftwarePrefetchHintDistance)); __ prfm(Address(str2, SoftwarePrefetchHintDistance)); compare_string_16_bytes_same(DIFF, DIFF2); compare_string_16_bytes_same(DIFF, DIFF2); __ sub(cnt2, cnt2, isLL ? 64 : 32); compare_string_16_bytes_same(DIFF, DIFF2); - __ subs(rscratch2, cnt2, largeLoopExitCondition); + __ subs(rscratch2, cnt2, 64); compare_string_16_bytes_same(DIFF, DIFF2); __ br(__ GT, LARGE_LOOP_PREFETCH); __ cbz(cnt2, LAST_CHECK_AND_LENGTH_DIFF); // no more chars left? } >> Do you have a theory that LU/UL cases are common? Why? The only "theory" can be compare_string_16_x_LU (in the stub) is fater than the 8 bytes main loop (out of the stub) (http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#l4997), even not in the large loop of stub, the small loop can be faster as well since it is able to process more bytes within fewer instructions (http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#l4203). I can prepare a new patch with the updates to tests, and plot the timings soon latter. Regards Patrick -----Original Message----- From: Andrew Haley Sent: Thursday, November 14, 2019 6:33 PM To: Patrick Zhang OS ; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable On 11/14/19 9:20 AM, Patrick Zhang OS wrote: > Thanks for the comments, see my answers below please. > >>> 1. This patch seems to do rather a lot. > Yes, it enables tweaking the stub parameters (not really changed any > in this patch), fixed an out-of-boundary prefetching for LL/UU, and > fixed some redundant instructions in LU/UL code path. The latter two > are code-quality-wise, if splitting the patch could make the changes > clearer, I'd like to do. Why do we care about out-of-boundary prefetching for LL/UU? I don't think we do if it requires any extra logic. >>> 2. Are the thresholds bytes or characters? > All thresholds are (and should be) in characters. This was a little > bit misleading, for LL/LU/UL, the const STUB_THRESHOLD meant chars, > while for UU it could be explained as bytes. If specified > -XX:-CompactStrings, all code path going to UU would make the > threshold mean bytes, which might confuse developers. This patch can > clarify it, and the description of tunable options can provide further > guidance. It must. Without some commentary both maintainers and developers are lost. Unless there is some very strong reason, all counts must specify units. >>> 3. How are we supposed to test with these different thresholds? > There are two jtreg tests for checking the impacts of > SoftwarePrefetchHintDistance over the intrinsics, I have locally added > non-default thresholds inside and tested with many lengths (took days > on a test system). This has not been included in the proposed patch, > maybe a follow-up one would do, any advice? > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToSameLength > .java > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentL > ength.java I won't accept this patch unless it is accompanied by test cases that properly exercise the code. >>> 4. What are the thresholds you tested? > Firstly, the default threshold, the hardcoded 72 is my testing focus > since I would try best not to bring negative impacts to aarch64-port > normal state, especially other CPU vendors. > Second, I tested two extreme thresholds: 24 and 255, which means more > shorter strings (24 to 71 chars) or only very long strings > (>=255) could go to the stub code path, respectively. Function tests > passed (listed in the initial email), while performance test results > (with string-density-bench, StringCompareBench.java, and > SPECjbb2015) could be varying with different systems (as well as > microarchitectures). > Third, some other non-default thresholds, as sanity check, > particularly for ensuring correctness. It's the extremes that really matter, I suspect. >>> 5. But the more serious problem is the fact that we have different >>> code paths for different microarchitectures, and somehow this has to >>> be standard supportable software. In order to test this stuff we'll >>> need different test parameters for SoftwarePrefetchHintDistance, >>> CompareLongStringLimitLatin, CompareLongStringLimitUTF > The STUB_THRESHOLD was introduced to control the stub code insertion, > tested on some aarch64 systems. I think making it tunable is the way > to let different microarchitectures be able to configure optimal ones > for their own. Well, yes. The question is whether we go down this rabbit hole or try to find a compromise that is perhaps not quite optimal for anyone but good enough for everyone. > I would like to have a common threshold too, or no threshold for all, > but lacking of full-coverage tests over all systems. Maybe I > misunderstood you points here with regards to "supportable", the two > new options can be kept as default if developers have no concerns on > string compare intrinsics. I rather suspect that vendors will want to change the defaults sooner or later. And besides, we'll all have to support these options. >>> 6. We already emit a great deal of in-line code in the >>> string_compare intrinsic, with the intention that this be as fast as >>> possible because we want to avoid having to call the intrinsic. So >>> why is the intrinsic actually faster in your case? > Avoid having to call the intrinsic? I meant "the stub". > If you did NOT mean completely "avoiding intrinsic", but the strings > shorter than 72 chars, I would have to say, "it depends". The stub > functions try best to process every 16 chars, while the outer logic > processes every 8 bytes, which is the major diff. For example, I can > see consistent 1.5x faster with lengths 24-71 for LU/UL cases, maybe > others cannot, which can be reason why we need an option here. I know that strings of length 24 - 30ish are very common, so this is an important case. Do you have a theory that LU/UL cases are common? Why? What is it like with LL/UU? I'd need to see real timings. I'd either do all numbers < 256 or (to save time) a sequence like... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 19, 21, 23, 25, 28, 30, 34, 37, 41, 45, 49, 54, 60, 66, 72, 80, 88, 97, 106, 117, 129, 142, 156, 171, 189, 207, 228, 251 The idea here is that we an plot a graph. The timings should ideally be monotonically increasing. And then we could see how different processors behave, and hopefully find a decent solution for all. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Thu Nov 14 12:26:15 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 14 Nov 2019 12:26:15 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations In-Reply-To: References: <65e93675-a3cf-53ac-6894-bb4124c55f93@redhat.com> <1f4c99ac-461c-7795-1a74-a494bdba3672@redhat.com> <1cc3ab16-eaab-d031-3df0-c9133de24f88@redhat.com> <8b527457-c371-45ae-bb54-0a048f9ee6f8@redhat.com> <32ea3e22-9f7a-9aaa-c86a-79ed175a1c7b@redhat.com> <83f92211-2c64-69d0-457c-c059acbccf63@oracle.com> <0d718a85-d669-f4b4-ae90-db1f7bb56b45@redhat.com> <1325b063-cc1b-74fe-3b78-f4eb4518d116@redhat.com> Message-ID: . > > Is there a reason to not reopen this bug: > > JDK-8233912 aarch64: minor improvements of atomic operations > https://bugs.openjdk.java.net/browse/JDK-8233912 > > Dan > Reopend and modified problem description on that bug. Webrev: http://cr.openjdk.java.net/~fyang/8233912/webrev.00/ The webrev also adds one comment from aph. Passed tier1 & 2 & 3 test. Also run jcstress test. Will do the push. Thanks, Felix From ci_notify at linaro.org Fri Nov 15 06:10:51 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Fri, 15 Nov 2019 06:10:51 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 13 on AArch64 Message-ID: <471749255.788.1573798251928.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk13/openjdk-jtreg-nightly-tests/summary/2019/318/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/04 pass: 5,644; fail: 2; error: 1 Build 1: aarch64/2019/jul/09 pass: 5,643; fail: 4 Build 2: aarch64/2019/jul/16 pass: 5,646; fail: 1 Build 3: aarch64/2019/jul/18 pass: 5,644; fail: 2; error: 1 Build 4: aarch64/2019/jul/20 pass: 5,645; fail: 1; error: 1 Build 5: aarch64/2019/jul/23 pass: 5,644; fail: 3 Build 6: aarch64/2019/jul/25 pass: 5,644; fail: 3 Build 7: aarch64/2019/jul/30 pass: 5,645; fail: 2 Build 8: aarch64/2019/aug/01 pass: 5,646; fail: 1 Build 9: aarch64/2019/aug/03 pass: 5,646; fail: 1 Build 10: aarch64/2019/aug/06 pass: 5,645; fail: 2 Build 11: aarch64/2019/aug/08 pass: 5,646; fail: 1 Build 12: aarch64/2019/aug/10 pass: 5,646; fail: 1 Build 13: aarch64/2019/nov/12 pass: 5,652 Build 14: aarch64/2019/nov/14 pass: 5,650; fail: 2 1 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/04 pass: 8,601; fail: 523; error: 26 Build 1: aarch64/2019/jul/09 pass: 8,606; fail: 515; error: 29 Build 2: aarch64/2019/jul/16 pass: 8,593; fail: 531; error: 30 Build 3: aarch64/2019/jul/18 pass: 8,618; fail: 527; error: 26 Build 4: aarch64/2019/jul/20 pass: 8,619; fail: 519; error: 33 Build 5: aarch64/2019/jul/23 pass: 8,616; fail: 525; error: 30 Build 6: aarch64/2019/jul/25 pass: 8,620; fail: 528; error: 23 Build 7: aarch64/2019/jul/30 pass: 8,610; fail: 529; error: 32 Build 8: aarch64/2019/aug/01 pass: 8,620; fail: 527; error: 24 Build 9: aarch64/2019/aug/03 pass: 8,596; fail: 552; error: 23 Build 10: aarch64/2019/aug/06 pass: 8,616; fail: 528; error: 27 Build 11: aarch64/2019/aug/08 pass: 8,649; fail: 504; error: 18 Build 12: aarch64/2019/aug/10 pass: 8,647; fail: 507; error: 17 Build 13: aarch64/2019/nov/12 pass: 8,650; fail: 513; error: 16 Build 14: aarch64/2019/nov/14 pass: 8,651; fail: 511; error: 17 4 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/04 pass: 3,962 Build 1: aarch64/2019/jul/09 pass: 3,962 Build 2: aarch64/2019/jul/16 pass: 3,963 Build 3: aarch64/2019/jul/18 pass: 3,964 Build 4: aarch64/2019/jul/20 pass: 3,964 Build 5: aarch64/2019/jul/23 pass: 3,964 Build 6: aarch64/2019/jul/25 pass: 3,964 Build 7: aarch64/2019/jul/30 pass: 3,964 Build 8: aarch64/2019/aug/01 pass: 3,964 Build 9: aarch64/2019/aug/03 pass: 3,964 Build 10: aarch64/2019/aug/06 pass: 3,964 Build 11: aarch64/2019/aug/08 pass: 3,964 Build 12: aarch64/2019/aug/10 pass: 3,964 Build 13: aarch64/2019/nov/12 pass: 3,964 Build 14: aarch64/2019/nov/14 pass: 3,964 Previous results can be found here: http://openjdk.linaro.org/jdk13/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.63x Relative performance: Server critical-jOPS (nc): 9.35x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk13/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 204.57 Server 204.57 / Server 2014-04-01 (71.00): 2.88x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk13/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-07-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/185/results/ 2019-07-10 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/190/results/ 2019-07-16 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/197/results/ 2019-07-19 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/199/results/ 2019-07-21 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/201/results/ 2019-07-24 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/204/results/ 2019-07-26 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/206/results/ 2019-07-31 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/211/results/ 2019-08-02 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/213/results/ 2019-08-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/215/results/ 2019-08-07 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/218/results/ 2019-08-09 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/220/results/ 2019-08-11 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/222/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/316/results/ 2019-11-15 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/318/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/ From ci_notify at linaro.org Fri Nov 15 06:15:19 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Fri, 15 Nov 2019 06:15:19 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 8u on AArch64 Message-ID: <107109560.790.1573798519711.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk8u/openjdk-jtreg-nightly-tests/summary/2019/318/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/25 pass: 802; fail: 25; error: 11 Build 1: aarch64/2019/jul/30 pass: 787; fail: 40; error: 11 Build 2: aarch64/2019/aug/01 pass: 800; fail: 26; error: 12 Build 3: aarch64/2019/aug/04 pass: 808; fail: 30; error: 2 Build 4: aarch64/2019/aug/06 pass: 799; fail: 29; error: 12 Build 5: aarch64/2019/aug/08 pass: 830; fail: 9; error: 1 Build 6: aarch64/2019/aug/11 pass: 825; fail: 14; error: 1 Build 7: aarch64/2019/aug/13 pass: 830; fail: 9; error: 1 Build 8: aarch64/2019/aug/15 pass: 837; fail: 9; error: 1 Build 9: aarch64/2019/aug/17 pass: 837; fail: 9; error: 1 Build 10: aarch64/2019/aug/22 pass: 837; fail: 9; error: 1 Build 11: aarch64/2019/sep/10 pass: 838; fail: 13; error: 1 Build 12: aarch64/2019/sep/21 pass: 838; fail: 13; error: 1 Build 13: aarch64/2019/nov/02 pass: 843; fail: 9; error: 1 Build 14: aarch64/2019/nov/14 pass: 843; fail: 9; error: 1 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/25 pass: 5,938; fail: 276; error: 26 Build 1: aarch64/2019/jul/30 pass: 5,942; fail: 273; error: 25 Build 2: aarch64/2019/aug/01 pass: 5,945; fail: 271; error: 24 Build 3: aarch64/2019/aug/04 pass: 5,949; fail: 270; error: 24 Build 4: aarch64/2019/aug/06 pass: 5,945; fail: 275; error: 23 Build 5: aarch64/2019/aug/08 pass: 5,953; fail: 267; error: 23 Build 6: aarch64/2019/aug/11 pass: 5,947; fail: 272; error: 25 Build 7: aarch64/2019/aug/13 pass: 5,962; fail: 258; error: 24 Build 8: aarch64/2019/aug/15 pass: 5,955; fail: 266; error: 23 Build 9: aarch64/2019/aug/17 pass: 5,951; fail: 269; error: 24 Build 10: aarch64/2019/aug/22 pass: 5,945; fail: 279; error: 20 Build 11: aarch64/2019/sep/10 pass: 5,951; fail: 273; error: 23 Build 12: aarch64/2019/sep/21 pass: 5,964; fail: 261; error: 22 Build 13: aarch64/2019/nov/02 pass: 5,956; fail: 278; error: 18 Build 14: aarch64/2019/nov/14 pass: 5,956; fail: 275; error: 21 ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/25 pass: 3,116; fail: 2 Build 1: aarch64/2019/jul/30 pass: 3,116; fail: 2 Build 2: aarch64/2019/aug/01 pass: 3,116; fail: 2 Build 3: aarch64/2019/aug/04 pass: 3,116; fail: 2 Build 4: aarch64/2019/aug/06 pass: 3,116; fail: 2 Build 5: aarch64/2019/aug/08 pass: 3,116; fail: 2 Build 6: aarch64/2019/aug/11 pass: 3,116; fail: 2 Build 7: aarch64/2019/aug/13 pass: 3,116; fail: 2 Build 8: aarch64/2019/aug/15 pass: 3,116; fail: 2 Build 9: aarch64/2019/aug/17 pass: 3,116; fail: 2 Build 10: aarch64/2019/aug/22 pass: 3,116; fail: 2 Build 11: aarch64/2019/sep/10 pass: 3,116; fail: 2 Build 12: aarch64/2019/sep/21 pass: 3,116; fail: 2 Build 13: aarch64/2019/nov/02 pass: 3,116; fail: 2 Build 14: aarch64/2019/nov/14 pass: 3,116; fail: 2 Previous results can be found here: http://openjdk.linaro.org/jdk8u/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 6.73x Relative performance: Server critical-jOPS (nc): 8.09x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk8u/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 174.26 Server 174.26 / Server 2014-04-01 (71.00): 2.45x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk8u/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-07-26 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/206/results/ 2019-07-31 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/211/results/ 2019-08-02 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/213/results/ 2019-08-05 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/216/results/ 2019-08-07 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/218/results/ 2019-08-09 pass rate: 8229/8229, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/220/results/ 2019-08-12 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/223/results/ 2019-08-13 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/225/results/ 2019-08-16 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/227/results/ 2019-08-17 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/229/results/ 2019-08-23 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/234/results/ 2019-09-11 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/253/results/ 2019-09-22 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/264/results/ 2019-11-02 pass rate: 8230/8230, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/306/results/ 2019-11-15 pass rate: 8231/8231, results: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/2019/318/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk8u/jcstress-nightly-runs/ From patrick at os.amperecomputing.com Fri Nov 15 07:51:17 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Fri, 15 Nov 2019 07:51:17 +0000 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> Message-ID: Hi Dmitrij, The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp.sdiff.html There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? Thanks 4327 address generate_compare_long_string_different_encoding(bool isLU) { 4377 if (SoftwarePrefetchHintDistance >= 0) { 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); 4379 __ br(__ LT, NO_PREFETCH); 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop ... ... 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); 4397 } // end of 64-characters loop 4616 address generate_compare_long_string_same_encoding(bool isLL) { 4637 if (SoftwarePrefetchHintDistance >= 0) { 4638 __ bind(LARGE_LOOP_PREFETCH); 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); 4641 compare_string_16_bytes_same(DIFF, DIFF2); 4642 compare_string_16_bytes_same(DIFF, DIFF2); 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); 4644 compare_string_16_bytes_same(DIFF, DIFF2); 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? 4646 compare_string_16_bytes_same(DIFF, DIFF2); 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left 4649 } Regards Patrick -----Original Message----- From: hotspot-compiler-dev On Behalf Of Dmitry Samersoff Sent: Sunday, May 19, 2019 11:42 PM To: Dmitrij Pochepko ; Andrew Haley ; Pengfei Li (Arm Technology China) Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Dmitrij, The changes looks good to me. -Dmitry On 25.02.2019 19:52, Dmitrij Pochepko wrote: > Hi Andrew, Pengfei, > > I created webrev.02 with all your suggestions implemented: > > webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ > > - comments are now both in separate section and inlined into code. > - documentation mismatch mentioned by Pengfei is fixed: > -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST > -- SHORT_LOOP_TAIL block now merged with last instruction. > Documentation is updated respectively > - minor other changes to layout and wording > > Newly developed tests were run as sanity and they passed. > > Thanks, > Dmitrij > > On 22/02/2019 6:42 PM, Andrew Haley wrote: >> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >> >>> So personally, I still prefer to inline the comments with the >>> original code block to avoid this kind of inconsistencies. And it >>> makes us easier to review or maintain the code together with the >>> doc, as we don't need to scroll back and force. I don't know the >>> benefit of making the code documentation as a separate part. What's >>> your opinion, Andrew Haley? >> I agree with you. There's no harm having both inline and separate. >> From patrick at os.amperecomputing.com Fri Nov 15 08:04:44 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Fri, 15 Nov 2019 08:04:44 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable In-Reply-To: References: <21d39cdf-b38d-f40e-c8b5-2d92094cdc2a@redhat.com> <2890e2c6-ed15-7f8a-5e88-83ed8c6af7ce@redhat.com> Message-ID: To avoid future confusion, I am going to split the patch, take out the updates for generate_compare_long_string_different_encoding, which drops two redundant temp Register vars and related unused instructions, then create a new for your review. It has nothing to do with the proposed option. And I will continue working the remaining parts according to your comments and suggestions.. Regards Patrick -----Original Message----- From: aarch64-port-dev On Behalf Of Patrick Zhang OS Sent: Thursday, November 14, 2019 7:14 PM To: Andrew Haley ; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable >> Why do we care about out-of-boundary prefetching for LL/UU? I don't think we do if it requires any extra logic. I was thinking out-of-boundary prefetching should be prevented, and UL/LU has the same condition, if no need, we could force set largeLoopExitCondition to be 64 so that more cases can freely stay in the large loop. I don't think so. http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#l4194 if (SoftwarePrefetchHintDistance >= 0) { __ bind(LARGE_LOOP_PREFETCH); __ prfm(Address(str1, SoftwarePrefetchHintDistance)); __ prfm(Address(str2, SoftwarePrefetchHintDistance)); compare_string_16_bytes_same(DIFF, DIFF2); compare_string_16_bytes_same(DIFF, DIFF2); __ sub(cnt2, cnt2, isLL ? 64 : 32); compare_string_16_bytes_same(DIFF, DIFF2); - __ subs(rscratch2, cnt2, largeLoopExitCondition); + __ subs(rscratch2, cnt2, 64); compare_string_16_bytes_same(DIFF, DIFF2); __ br(__ GT, LARGE_LOOP_PREFETCH); __ cbz(cnt2, LAST_CHECK_AND_LENGTH_DIFF); // no more chars left? } >> Do you have a theory that LU/UL cases are common? Why? The only "theory" can be compare_string_16_x_LU (in the stub) is fater than the 8 bytes main loop (out of the stub) (http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#l4997), even not in the large loop of stub, the small loop can be faster as well since it is able to process more bytes within fewer instructions (http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#l4203). I can prepare a new patch with the updates to tests, and plot the timings soon latter. Regards Patrick -----Original Message----- From: Andrew Haley Sent: Thursday, November 14, 2019 6:33 PM To: Patrick Zhang OS ; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable On 11/14/19 9:20 AM, Patrick Zhang OS wrote: > Thanks for the comments, see my answers below please. > >>> 1. This patch seems to do rather a lot. > Yes, it enables tweaking the stub parameters (not really changed any > in this patch), fixed an out-of-boundary prefetching for LL/UU, and > fixed some redundant instructions in LU/UL code path. The latter two > are code-quality-wise, if splitting the patch could make the changes > clearer, I'd like to do. Why do we care about out-of-boundary prefetching for LL/UU? I don't think we do if it requires any extra logic. >>> 2. Are the thresholds bytes or characters? > All thresholds are (and should be) in characters. This was a little > bit misleading, for LL/LU/UL, the const STUB_THRESHOLD meant chars, > while for UU it could be explained as bytes. If specified > -XX:-CompactStrings, all code path going to UU would make the > threshold mean bytes, which might confuse developers. This patch can > clarify it, and the description of tunable options can provide further > guidance. It must. Without some commentary both maintainers and developers are lost. Unless there is some very strong reason, all counts must specify units. >>> 3. How are we supposed to test with these different thresholds? > There are two jtreg tests for checking the impacts of > SoftwarePrefetchHintDistance over the intrinsics, I have locally added > non-default thresholds inside and tested with many lengths (took days > on a test system). This has not been included in the proposed patch, > maybe a follow-up one would do, any advice? > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToSameLength > .java > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentL > ength.java I won't accept this patch unless it is accompanied by test cases that properly exercise the code. >>> 4. What are the thresholds you tested? > Firstly, the default threshold, the hardcoded 72 is my testing focus > since I would try best not to bring negative impacts to aarch64-port > normal state, especially other CPU vendors. > Second, I tested two extreme thresholds: 24 and 255, which means more > shorter strings (24 to 71 chars) or only very long strings > (>=255) could go to the stub code path, respectively. Function tests > passed (listed in the initial email), while performance test results > (with string-density-bench, StringCompareBench.java, and > SPECjbb2015) could be varying with different systems (as well as > microarchitectures). > Third, some other non-default thresholds, as sanity check, > particularly for ensuring correctness. It's the extremes that really matter, I suspect. >>> 5. But the more serious problem is the fact that we have different >>> code paths for different microarchitectures, and somehow this has to >>> be standard supportable software. In order to test this stuff we'll >>> need different test parameters for SoftwarePrefetchHintDistance, >>> CompareLongStringLimitLatin, CompareLongStringLimitUTF > The STUB_THRESHOLD was introduced to control the stub code insertion, > tested on some aarch64 systems. I think making it tunable is the way > to let different microarchitectures be able to configure optimal ones > for their own. Well, yes. The question is whether we go down this rabbit hole or try to find a compromise that is perhaps not quite optimal for anyone but good enough for everyone. > I would like to have a common threshold too, or no threshold for all, > but lacking of full-coverage tests over all systems. Maybe I > misunderstood you points here with regards to "supportable", the two > new options can be kept as default if developers have no concerns on > string compare intrinsics. I rather suspect that vendors will want to change the defaults sooner or later. And besides, we'll all have to support these options. >>> 6. We already emit a great deal of in-line code in the >>> string_compare intrinsic, with the intention that this be as fast as >>> possible because we want to avoid having to call the intrinsic. So >>> why is the intrinsic actually faster in your case? > Avoid having to call the intrinsic? I meant "the stub". > If you did NOT mean completely "avoiding intrinsic", but the strings > shorter than 72 chars, I would have to say, "it depends". The stub > functions try best to process every 16 chars, while the outer logic > processes every 8 bytes, which is the major diff. For example, I can > see consistent 1.5x faster with lengths 24-71 for LU/UL cases, maybe > others cannot, which can be reason why we need an option here. I know that strings of length 24 - 30ish are very common, so this is an important case. Do you have a theory that LU/UL cases are common? Why? What is it like with LL/UU? I'd need to see real timings. I'd either do all numbers < 256 or (to save time) a sequence like... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 19, 21, 23, 25, 28, 30, 34, 37, 41, 45, 49, 54, 60, 66, 72, 80, 88, 97, 106, 117, 129, 142, 156, 171, 189, 207, 228, 251 The idea here is that we an plot a graph. The timings should ideally be monotonically increasing. And then we could see how different processors behave, and hopefully find a decent solution for all. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Fri Nov 15 08:33:15 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 15 Nov 2019 08:33:15 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port Message-ID: Any comments on that? I have posted some of the hs_err_log files here: http://cr.openjdk.java.net/~fyang/sigill-crashes.tar.bz2 Thanks, Felix From: Yangfei (Felix) Sent: Tuesday, November 12, 2019 3:37 PM To: aarch64-port-dev at openjdk.java.net Subject: Question about ISB usage in the aarch64 port Hi, I am witnessing some SIGILL jvm crashes on my aarch64 platform. I looked at the ISB usage, especially this one: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2014-September/001376.html One of changes is adding one ISB after the native call returns: 1100 static void rt_call(MacroAssembler* masm, address dest, int gpargs, int fpargs, int type) { 1101 CodeBlob *cb = CodeCache::find_blob(dest); 1102 if (cb) { 1103 __ far_call(RuntimeAddress(dest)); 1104 } else { 1105 assert((unsigned)gpargs < 256, "eek!"); 1106 assert((unsigned)fpargs < 32, "eek!"); 1107 __ lea(rscratch1, RuntimeAddress(dest)); 1108 __ blr(rscratch1); 1109 __ maybe_isb(); < ======== 1110 } 1111 } The rt_call function is used in generate_native_wrapper to make the JNI call. As I didn?t see the barrier for the ppc or arm port. I would like to know more details here. Does anyone still remember? Also the ISB is planted only in the else block. I assume this is also necessary for the if block. Correct? Thanks for your help, Felix From Pengfei.Li at arm.com Fri Nov 15 09:15:37 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 15 Nov 2019 09:15:37 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> Message-ID: Hi Andrew, > > This patch aligns with the implementation in [1] which makes the > > x86_64 > > r12 register allocatable. Please let me know if I have missed anything > > for AArch64. > > We don't generally use r27 for compressed class pointers. Do you mean that r27 is only used for encoding/decoding oops but not for any klass pointers? I looked at the AArch64 code and find it also used in MacroAssembler::encode_klass_not_null() if the compressed mode is not zero-based. -- Thanks, Pengfei From Joshua.Zhu at arm.com Fri Nov 15 10:29:49 2019 From: Joshua.Zhu at arm.com (Joshua Zhu (Arm Technology China)) Date: Fri, 15 Nov 2019 10:29:49 +0000 Subject: [aarch64-port-dev ] 8233948: AArch64: Incorrect mapping between OptoReg and VMReg for high 64 bits of Vector Register In-Reply-To: References: Message-ID: Hi, > Please review the following patch: > JBS: https://bugs.openjdk.java.net/browse/JDK-8233948 > Webrev: http://cr.openjdk.java.net/~jzhu/8233948/webrev.00/ Please let me know if any comments. Thanks a lot. Best Regards, Joshua From patrick at os.amperecomputing.com Fri Nov 15 10:54:16 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Fri, 15 Nov 2019 10:54:16 +0000 Subject: [aarch64-port-dev ] RFR (trivial): 8234228: AArch64: Clean up redundant temp vars in generate_compare_long_string_different_encoding Message-ID: Hi Reviewers, This is a simple patch which cleans up some redundant temp vars and related instructions in generate_compare_long_string_different_encoding. JBS: https://bugs.openjdk.java.net/browse/JDK-8234228 Webrev: http://cr.openjdk.java.net/~qpzhang/8234228/webrev.01 In generate_compare_long_string_different_encoding, the two Register vars strU and strL were used to record the pointers of the last 4 characters for the final comparisons. strU has been no use since the latest code updates as the chars got pre-loaded (r12) by compare_string_16_x_LU early, and strL is redundant too since the pointer is available in r11. Cleaning up these can save two add, two temp vars, and replace two sub with mov. In addition, r10 in compare_string_16_x_LU is not used, cleaned the temp var too. Tested jtreg tier1, and hotspot runtime/compiler, no new failures found. Double checked with string intrinsics cases under [1], no regression found. Ran [2] CompareToBench LU/UL as performance check, no regression found, and slight gains with some input sizes [1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar Regards Patrick From aph at redhat.com Fri Nov 15 14:49:14 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 15 Nov 2019 14:49:14 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> Message-ID: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> On 11/15/19 9:15 AM, Pengfei Li (Arm Technology China) wrote: >>> This patch aligns with the implementation in [1] which makes the >>> x86_64 >>> r12 register allocatable. Please let me know if I have missed anything >>> for AArch64. >> >> We don't generally use r27 for compressed class pointers. > > Do you mean that r27 is only used for encoding/decoding oops but not for > any klass pointers? Almost always, yes. > I looked at the AArch64 code and find it also used in > MacroAssembler::encode_klass_not_null() if the compressed mode is > not zero-based. I see if (use_XOR_for_compressed_class_base) { if (CompressedKlassPointers::shift() != 0) { eor(dst, src, (uint64_t)CompressedKlassPointers::base()); lsr(dst, dst, LogKlassAlignmentInBytes); } else { eor(dst, src, (uint64_t)CompressedKlassPointers::base()); } return; } if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 && CompressedKlassPointers::shift() == 0) { movw(dst, src); return; } ... followed by code which does use r27. Do you ever see r27 being used? If so, I'd be interested to know how this gets triggered and what command-line arguments you use. It's rather inefficient. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Fri Nov 15 15:51:42 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 15 Nov 2019 18:51:42 +0300 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> Message-ID: <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Hi Patrick, My experiments back then showed that few platforms (some of Cortex A* series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. Thanks, Dmitrij On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: > Hi Dmitrij, > > The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. > http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp.sdiff.html > > There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! > The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? > > I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? > > Thanks > > 4327 address generate_compare_long_string_different_encoding(bool isLU) { > 4377 if (SoftwarePrefetchHintDistance >= 0) { > 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); > 4379 __ br(__ LT, NO_PREFETCH); > 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop > ... ... > 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? > 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); > 4397 } // end of 64-characters loop > > 4616 address generate_compare_long_string_same_encoding(bool isLL) { > 4637 if (SoftwarePrefetchHintDistance >= 0) { > 4638 __ bind(LARGE_LOOP_PREFETCH); > 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); > 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); > 4641 compare_string_16_bytes_same(DIFF, DIFF2); > 4642 compare_string_16_bytes_same(DIFF, DIFF2); > 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); > 4644 compare_string_16_bytes_same(DIFF, DIFF2); > 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? > 4646 compare_string_16_bytes_same(DIFF, DIFF2); > 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); > 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left > 4649 } > > Regards > Patrick > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Dmitry Samersoff > Sent: Sunday, May 19, 2019 11:42 PM > To: Dmitrij Pochepko ; Andrew Haley ; Pengfei Li (Arm Technology China) > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement > > Dmitrij, > > The changes looks good to me. > > -Dmitry > > On 25.02.2019 19:52, Dmitrij Pochepko wrote: >> Hi Andrew, Pengfei, >> >> I created webrev.02 with all your suggestions implemented: >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >> >> - comments are now both in separate section and inlined into code. >> - documentation mismatch mentioned by Pengfei is fixed: >> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >> -- SHORT_LOOP_TAIL block now merged with last instruction. >> Documentation is updated respectively >> - minor other changes to layout and wording >> >> Newly developed tests were run as sanity and they passed. >> >> Thanks, >> Dmitrij >> >> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>> >>>> So personally, I still prefer to inline the comments with the >>>> original code block to avoid this kind of inconsistencies. And it >>>> makes us easier to review or maintain the code together with the >>>> doc, as we don't need to scroll back and force. I don't know the >>>> benefit of making the code documentation as a separate part. What's >>>> your opinion, Andrew Haley? >>> I agree with you. There's no harm having both inline and separate. >>> From ci_notify at linaro.org Sat Nov 16 01:33:31 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Sat, 16 Nov 2019 01:33:31 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <954629567.893.1573868012484.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/319/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/11 pass: 5,751; fail: 1 Build 1: aarch64/2019/oct/14 pass: 5,753 Build 2: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 3: aarch64/2019/oct/18 pass: 5,760 Build 4: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 5: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 6: aarch64/2019/oct/28 pass: 5,766 Build 7: aarch64/2019/oct/30 pass: 5,768 Build 8: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 9: aarch64/2019/nov/04 pass: 5,769 Build 10: aarch64/2019/nov/06 pass: 5,766; fail: 2 Build 11: aarch64/2019/nov/08 pass: 5,761 Build 12: aarch64/2019/nov/11 pass: 5,762 Build 13: aarch64/2019/nov/13 pass: 5,764; fail: 1 Build 14: aarch64/2019/nov/15 pass: 5,750 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/11 pass: 8,693; fail: 511; error: 18 Build 1: aarch64/2019/oct/14 pass: 8,706; fail: 497; error: 20 Build 2: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 3: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 4: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 5: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 6: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 7: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 8: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 9: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 Build 10: aarch64/2019/nov/06 pass: 8,775; fail: 507; error: 19 Build 11: aarch64/2019/nov/08 pass: 8,774; fail: 510; error: 17 Build 12: aarch64/2019/nov/11 pass: 8,777; fail: 509; error: 15 Build 13: aarch64/2019/nov/13 pass: 8,773; fail: 509; error: 21 Build 14: aarch64/2019/nov/15 pass: 8,756; fail: 511; error: 19 3 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/11 pass: 3,979 Build 1: aarch64/2019/oct/14 pass: 3,979 Build 2: aarch64/2019/oct/16 pass: 3,979 Build 3: aarch64/2019/oct/18 pass: 3,979 Build 4: aarch64/2019/oct/21 pass: 3,979 Build 5: aarch64/2019/oct/23 pass: 3,980 Build 6: aarch64/2019/oct/28 pass: 3,980 Build 7: aarch64/2019/oct/30 pass: 3,980 Build 8: aarch64/2019/nov/01 pass: 3,980 Build 9: aarch64/2019/nov/04 pass: 3,980 Build 10: aarch64/2019/nov/06 pass: 3,980 Build 11: aarch64/2019/nov/08 pass: 3,980 Build 12: aarch64/2019/nov/11 pass: 3,980 Build 13: aarch64/2019/nov/13 pass: 3,980 Build 14: aarch64/2019/nov/15 pass: 3,981 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.74x Relative performance: Server critical-jOPS (nc): 9.52x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 207.57 Server 207.57 / Server 2014-04-01 (71.00): 2.92x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-10-10 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/282/results/ 2019-10-12 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/284/results/ 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ 2019-11-07 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/310/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/315/results/ 2019-11-14 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/317/results/ 2019-11-16 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/319/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From ci_notify at linaro.org Sun Nov 17 19:23:25 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Sun, 17 Nov 2019 19:23:25 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 13 on AArch64 Message-ID: <607733841.1001.1574018606160.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk13/openjdk-jtreg-nightly-tests/summary/2019/320/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/09 pass: 5,643; fail: 4 Build 1: aarch64/2019/jul/16 pass: 5,646; fail: 1 Build 2: aarch64/2019/jul/18 pass: 5,644; fail: 2; error: 1 Build 3: aarch64/2019/jul/20 pass: 5,645; fail: 1; error: 1 Build 4: aarch64/2019/jul/23 pass: 5,644; fail: 3 Build 5: aarch64/2019/jul/25 pass: 5,644; fail: 3 Build 6: aarch64/2019/jul/30 pass: 5,645; fail: 2 Build 7: aarch64/2019/aug/01 pass: 5,646; fail: 1 Build 8: aarch64/2019/aug/03 pass: 5,646; fail: 1 Build 9: aarch64/2019/aug/06 pass: 5,645; fail: 2 Build 10: aarch64/2019/aug/08 pass: 5,646; fail: 1 Build 11: aarch64/2019/aug/10 pass: 5,646; fail: 1 Build 12: aarch64/2019/nov/12 pass: 5,652 Build 13: aarch64/2019/nov/14 pass: 5,650; fail: 2 Build 14: aarch64/2019/nov/16 pass: 5,652 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/09 pass: 8,606; fail: 515; error: 29 Build 1: aarch64/2019/jul/16 pass: 8,593; fail: 531; error: 30 Build 2: aarch64/2019/jul/18 pass: 8,618; fail: 527; error: 26 Build 3: aarch64/2019/jul/20 pass: 8,619; fail: 519; error: 33 Build 4: aarch64/2019/jul/23 pass: 8,616; fail: 525; error: 30 Build 5: aarch64/2019/jul/25 pass: 8,620; fail: 528; error: 23 Build 6: aarch64/2019/jul/30 pass: 8,610; fail: 529; error: 32 Build 7: aarch64/2019/aug/01 pass: 8,620; fail: 527; error: 24 Build 8: aarch64/2019/aug/03 pass: 8,596; fail: 552; error: 23 Build 9: aarch64/2019/aug/06 pass: 8,616; fail: 528; error: 27 Build 10: aarch64/2019/aug/08 pass: 8,649; fail: 504; error: 18 Build 11: aarch64/2019/aug/10 pass: 8,647; fail: 507; error: 17 Build 12: aarch64/2019/nov/12 pass: 8,650; fail: 513; error: 16 Build 13: aarch64/2019/nov/14 pass: 8,651; fail: 511; error: 17 Build 14: aarch64/2019/nov/16 pass: 8,663; fail: 500; error: 17 4 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/09 pass: 3,962 Build 1: aarch64/2019/jul/16 pass: 3,963 Build 2: aarch64/2019/jul/18 pass: 3,964 Build 3: aarch64/2019/jul/20 pass: 3,964 Build 4: aarch64/2019/jul/23 pass: 3,964 Build 5: aarch64/2019/jul/25 pass: 3,964 Build 6: aarch64/2019/jul/30 pass: 3,964 Build 7: aarch64/2019/aug/01 pass: 3,964 Build 8: aarch64/2019/aug/03 pass: 3,964 Build 9: aarch64/2019/aug/06 pass: 3,964 Build 10: aarch64/2019/aug/08 pass: 3,964 Build 11: aarch64/2019/aug/10 pass: 3,964 Build 12: aarch64/2019/nov/12 pass: 3,964 Build 13: aarch64/2019/nov/14 pass: 3,964 Build 14: aarch64/2019/nov/16 pass: 3,964 Previous results can be found here: http://openjdk.linaro.org/jdk13/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.63x Relative performance: Server critical-jOPS (nc): 9.40x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk13/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 210.67 Server 210.67 / Server 2014-04-01 (71.00): 2.97x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk13/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-07-10 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/190/results/ 2019-07-16 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/197/results/ 2019-07-19 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/199/results/ 2019-07-21 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/201/results/ 2019-07-24 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/204/results/ 2019-07-26 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/206/results/ 2019-07-31 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/211/results/ 2019-08-02 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/213/results/ 2019-08-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/215/results/ 2019-08-07 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/218/results/ 2019-08-09 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/220/results/ 2019-08-11 pass rate: 10487/10488, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/222/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/316/results/ 2019-11-15 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/318/results/ 2019-11-17 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/2019/320/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk13/jcstress-nightly-runs/ From ci_notify at linaro.org Sun Nov 17 19:26:51 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Sun, 17 Nov 2019 19:26:51 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 11u on AArch64 Message-ID: <1239889170.1003.1574018811781.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/summary/2019/320/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/02 pass: 5,737; fail: 5 Build 1: aarch64/2019/aug/03 pass: 5,746; fail: 4 Build 2: aarch64/2019/aug/10 pass: 5,747; fail: 4 Build 3: aarch64/2019/aug/15 pass: 5,753; fail: 4 Build 4: aarch64/2019/aug/22 pass: 5,755; fail: 4 Build 5: aarch64/2019/sep/04 pass: 5,764; fail: 2 Build 6: aarch64/2019/sep/05 pass: 5,764; fail: 2 Build 7: aarch64/2019/sep/10 pass: 5,764; fail: 2 Build 8: aarch64/2019/sep/17 pass: 5,763; fail: 3 Build 9: aarch64/2019/sep/21 pass: 5,764; fail: 2 Build 10: aarch64/2019/oct/04 pass: 5,764; fail: 2 Build 11: aarch64/2019/oct/17 pass: 5,764; fail: 2 Build 12: aarch64/2019/oct/31 pass: 5,784; fail: 1 Build 13: aarch64/2019/nov/09 pass: 5,773; fail: 3 Build 14: aarch64/2019/nov/16 pass: 5,775; fail: 1 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/02 pass: 8,407; fail: 498; error: 31 Build 1: aarch64/2019/aug/03 pass: 8,429; fail: 509; error: 18 Build 2: aarch64/2019/aug/10 pass: 8,450; fail: 485; error: 16 Build 3: aarch64/2019/aug/15 pass: 8,443; fail: 496; error: 13 Build 4: aarch64/2019/aug/22 pass: 8,446; fail: 494; error: 15 Build 5: aarch64/2019/sep/04 pass: 8,483; fail: 465; error: 10 Build 6: aarch64/2019/sep/05 pass: 8,465; fail: 479; error: 14 Build 7: aarch64/2019/sep/10 pass: 8,444; fail: 500; error: 14 Build 8: aarch64/2019/sep/17 pass: 8,462; fail: 482; error: 12 Build 9: aarch64/2019/sep/21 pass: 8,467; fail: 478; error: 13 Build 10: aarch64/2019/oct/04 pass: 8,444; fail: 498; error: 16 Build 11: aarch64/2019/oct/17 pass: 8,452; fail: 493; error: 16 Build 12: aarch64/2019/oct/31 pass: 8,468; fail: 490; error: 14 Build 13: aarch64/2019/nov/09 pass: 8,487; fail: 470; error: 16 Build 14: aarch64/2019/nov/16 pass: 8,475; fail: 484; error: 15 3 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/jul/02 pass: 3,908 Build 1: aarch64/2019/aug/03 pass: 3,908 Build 2: aarch64/2019/aug/10 pass: 3,909 Build 3: aarch64/2019/aug/15 pass: 3,909 Build 4: aarch64/2019/aug/22 pass: 3,909 Build 5: aarch64/2019/sep/04 pass: 3,910 Build 6: aarch64/2019/sep/05 pass: 3,910 Build 7: aarch64/2019/sep/10 pass: 3,910 Build 8: aarch64/2019/sep/17 pass: 3,910 Build 9: aarch64/2019/sep/21 pass: 3,910 Build 10: aarch64/2019/oct/04 pass: 3,910 Build 11: aarch64/2019/oct/17 pass: 3,910 Build 12: aarch64/2019/oct/31 pass: 3,910 Build 13: aarch64/2019/nov/09 pass: 3,910 Build 14: aarch64/2019/nov/16 pass: 3,910 Previous results can be found here: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.38x Relative performance: Server critical-jOPS (nc): 8.14x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 207.57 Server 207.57 / Server 2014-04-01 (71.00): 2.92x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-07-03 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/183/results/ 2019-08-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/215/results/ 2019-08-11 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/222/results/ 2019-08-16 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/227/results/ 2019-08-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/234/results/ 2019-09-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/247/results/ 2019-09-07 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/248/results/ 2019-09-11 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/253/results/ 2019-09-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/260/results/ 2019-09-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/264/results/ 2019-10-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/277/results/ 2019-10-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/290/results/ 2019-11-01 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/304/results/ 2019-11-10 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/313/results/ 2019-11-17 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/320/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/ From patrick at os.amperecomputing.com Mon Nov 18 03:52:26 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Mon, 18 Nov 2019 03:52:26 +0000 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Message-ID: Thanks for the information. I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop? I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects? Regards Patrick -----Original Message----- From: Dmitrij Pochepko Sent: Friday, November 15, 2019 11:52 PM To: Patrick Zhang OS Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff ; Andrew Haley ; Pengfei Li (Arm Technology China) Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Hi Patrick, My experiments back then showed that few platforms (some of Cortex A* series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. Thanks, Dmitrij On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: > Hi Dmitrij, > > The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. > http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu > /aarch64/stubGenerator_aarch64.cpp.sdiff.html > > There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! > The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? > > I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? > > Thanks > > 4327 address generate_compare_long_string_different_encoding(bool isLU) { > 4377 if (SoftwarePrefetchHintDistance >= 0) { > 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); > 4379 __ br(__ LT, NO_PREFETCH); > 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop > ... ... > 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? > 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); > 4397 } // end of 64-characters loop > > 4616 address generate_compare_long_string_same_encoding(bool isLL) { > 4637 if (SoftwarePrefetchHintDistance >= 0) { > 4638 __ bind(LARGE_LOOP_PREFETCH); > 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); > 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); > 4641 compare_string_16_bytes_same(DIFF, DIFF2); > 4642 compare_string_16_bytes_same(DIFF, DIFF2); > 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); > 4644 compare_string_16_bytes_same(DIFF, DIFF2); > 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? > 4646 compare_string_16_bytes_same(DIFF, DIFF2); > 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); > 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left > 4649 } > > Regards > Patrick > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Dmitry > Samersoff > Sent: Sunday, May 19, 2019 11:42 PM > To: Dmitrij Pochepko ; Andrew Haley > ; Pengfei Li (Arm Technology China) > > Cc: hotspot-compiler-dev at openjdk.java.net; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: > String::compareTo intrinsic documentation and maintenance improvement > > Dmitrij, > > The changes looks good to me. > > -Dmitry > > On 25.02.2019 19:52, Dmitrij Pochepko wrote: >> Hi Andrew, Pengfei, >> >> I created webrev.02 with all your suggestions implemented: >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >> >> - comments are now both in separate section and inlined into code. >> - documentation mismatch mentioned by Pengfei is fixed: >> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >> -- SHORT_LOOP_TAIL block now merged with last instruction. >> Documentation is updated respectively >> - minor other changes to layout and wording >> >> Newly developed tests were run as sanity and they passed. >> >> Thanks, >> Dmitrij >> >> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>> >>>> So personally, I still prefer to inline the comments with the >>>> original code block to avoid this kind of inconsistencies. And it >>>> makes us easier to review or maintain the code together with the >>>> doc, as we don't need to scroll back and force. I don't know the >>>> benefit of making the code documentation as a separate part. What's >>>> your opinion, Andrew Haley? >>> I agree with you. There's no harm having both inline and separate. >>> From patrick at os.amperecomputing.com Mon Nov 18 04:03:55 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Mon, 18 Nov 2019 04:03:55 +0000 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Message-ID: >> changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Sorry my second paragraph was inaccurate, it seems you experimented that there were some cases ran well with the first iteration of the large-loop but would rather quit the loop and go to the small-loop immediately for better performance (?). Please correct me if I misunderstood this. Thanks. Regards Patrick -----Original Message----- From: hotspot-compiler-dev On Behalf Of Patrick Zhang OS Sent: Monday, November 18, 2019 11:52 AM To: Dmitrij Pochepko Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Thanks for the information. I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop? I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects? Regards Patrick -----Original Message----- From: Dmitrij Pochepko Sent: Friday, November 15, 2019 11:52 PM To: Patrick Zhang OS Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff ; Andrew Haley ; Pengfei Li (Arm Technology China) Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Hi Patrick, My experiments back then showed that few platforms (some of Cortex A* series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. Thanks, Dmitrij On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: > Hi Dmitrij, > > The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. > http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu > /aarch64/stubGenerator_aarch64.cpp.sdiff.html > > There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! > The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? > > I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? > > Thanks > > 4327 address generate_compare_long_string_different_encoding(bool isLU) { > 4377 if (SoftwarePrefetchHintDistance >= 0) { > 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); > 4379 __ br(__ LT, NO_PREFETCH); > 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop > ... ... > 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? > 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); > 4397 } // end of 64-characters loop > > 4616 address generate_compare_long_string_same_encoding(bool isLL) { > 4637 if (SoftwarePrefetchHintDistance >= 0) { > 4638 __ bind(LARGE_LOOP_PREFETCH); > 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); > 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); > 4641 compare_string_16_bytes_same(DIFF, DIFF2); > 4642 compare_string_16_bytes_same(DIFF, DIFF2); > 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); > 4644 compare_string_16_bytes_same(DIFF, DIFF2); > 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? > 4646 compare_string_16_bytes_same(DIFF, DIFF2); > 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); > 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left > 4649 } > > Regards > Patrick > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Dmitry > Samersoff > Sent: Sunday, May 19, 2019 11:42 PM > To: Dmitrij Pochepko ; Andrew Haley > ; Pengfei Li (Arm Technology China) > > Cc: hotspot-compiler-dev at openjdk.java.net; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: > String::compareTo intrinsic documentation and maintenance improvement > > Dmitrij, > > The changes looks good to me. > > -Dmitry > > On 25.02.2019 19:52, Dmitrij Pochepko wrote: >> Hi Andrew, Pengfei, >> >> I created webrev.02 with all your suggestions implemented: >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >> >> - comments are now both in separate section and inlined into code. >> - documentation mismatch mentioned by Pengfei is fixed: >> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >> -- SHORT_LOOP_TAIL block now merged with last instruction. >> Documentation is updated respectively >> - minor other changes to layout and wording >> >> Newly developed tests were run as sanity and they passed. >> >> Thanks, >> Dmitrij >> >> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>> >>>> So personally, I still prefer to inline the comments with the >>>> original code block to avoid this kind of inconsistencies. And it >>>> makes us easier to review or maintain the code together with the >>>> doc, as we don't need to scroll back and force. I don't know the >>>> benefit of making the code documentation as a separate part. What's >>>> your opinion, Andrew Haley? >>> I agree with you. There's no harm having both inline and separate. >>> From Pengfei.Li at arm.com Mon Nov 18 09:58:11 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 18 Nov 2019 09:58:11 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: Hi Andrew, > I see > > if (use_XOR_for_compressed_class_base) { > if (CompressedKlassPointers::shift() != 0) { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > lsr(dst, dst, LogKlassAlignmentInBytes); > } else { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > } > return; > } > > if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 > && CompressedKlassPointers::shift() == 0) { > movw(dst, src); > return; > } > > ... followed by code which does use r27. > > Do you ever see r27 being used? If so, I'd be interested to know how this gets > triggered and what command-line arguments you use. It's rather inefficient. I think you're right. I tried hard with various VM options but still failed to get the code after this part triggered. The worst case I've ever found is that the encoding/decoding returns at if block if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 && CompressedKlassPointers::shift() == 0) { ... } By browsing the code, I found this is caused by a metaspace reservation trick that always tries to make AArch64 metaspace 4G-aligned. [1] If we do have the confidence that r27 won't be used for class pointers, I will remove UseCompressedClassPointers in my if condition. Another question, shall we clean up the (almost) dead code which uses r27 for encoding/decoding class pointers? [1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/memory/metaspace.cpp#l1048 -- Thanks, Pengfei From aph at redhat.com Mon Nov 18 10:06:46 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Nov 2019 10:06:46 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> On 11/18/19 9:58 AM, Pengfei Li (Arm Technology China) wrote: > I think you're right. I tried hard with various VM options but still failed to > get the code after this part triggered. The worst case I've ever found is that > the encoding/decoding returns at if block > if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 > && CompressedKlassPointers::shift() == 0) { ... } > > By browsing the code, I found this is caused by a metaspace reservation trick > that always tries to make AArch64 metaspace 4G-aligned. [1] > > If we do have the confidence that r27 won't be used for class pointers, I will > remove UseCompressedClassPointers in my if condition. Another question, shall > we clean up the (almost) dead code which uses r27 for encoding/decoding class > pointers? > > [1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/memory/metaspace.cpp#l1048 We should have a flag which is set if the search for nicely-aligned memory is successful, and then you can use that flag to determine if r27 is needed. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Pengfei.Li at arm.com Mon Nov 18 10:35:18 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 18 Nov 2019 10:35:18 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> Message-ID: Hi Andrew, > We should have a flag which is set if the search for nicely-aligned memory is > successful, and then you can use that flag to determine if r27 is needed. I just found in current HotSpot code, UseCompressedOops must be on for UseCompressedClassPointers to be on. See arguments.cpp [1]. If this is true, UseCompressedClassPointers cannot be used without UseCompressedOops. So wouldn't a single condition of UseCompressedOops be enough? But the x86_64 code which I referenced has both two conditions. Is it because the relationship of the arguments are subject to change in the future? [1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/runtime/arguments.cpp#l1715 -- Thanks, Pengfei From aph at redhat.com Mon Nov 18 10:39:03 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Nov 2019 10:39:03 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> Message-ID: <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> On 11/18/19 10:35 AM, Pengfei Li (Arm Technology China) wrote: > If this is true, UseCompressedClassPointers cannot be used without > UseCompressedOops. So wouldn't a single condition of UseCompressedOops be > enough? Why do you think so? UseCompressedOops doesn't usually need r27. > But the x86_64 code which I referenced has both two conditions. > Is it because the relationship of the arguments are subject to change in the > future? I have no idea why these flags depend on each other. I'd use compressed class pointers all the time, regardless of compressed oops. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Mon Nov 18 12:02:13 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 18 Nov 2019 15:02:13 +0300 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Message-ID: <3427facc-eb05-d690-eebe-acca39b87d4a@bell-sw.com> On 18/11/2019 7:03 AM, Patrick Zhang OS wrote: >>> changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. > Sorry my second paragraph was inaccurate, it seems you experimented that there were some cases ran well with the first iteration of the large-loop but would rather quit the loop and go to the small-loop immediately for better performance (?). Please correct me if I misunderstood this. Thanks. > > Regards > Patrick Yes. That's correct. Thanks, Dmitrij > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Patrick Zhang OS > Sent: Monday, November 18, 2019 11:52 AM > To: Dmitrij Pochepko > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: RE: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement > > Thanks for the information. > I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop? > I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects? > > Regards > Patrick > > -----Original Message----- > From: Dmitrij Pochepko > Sent: Friday, November 15, 2019 11:52 PM > To: Patrick Zhang OS > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff ; Andrew Haley ; Pengfei Li (Arm Technology China) > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement > > Hi Patrick, > > My experiments back then showed that few platforms (some of Cortex A* > series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. > It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. > Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. > > Thanks, > Dmitrij > > On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: >> Hi Dmitrij, >> >> The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. >> http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu >> /aarch64/stubGenerator_aarch64.cpp.sdiff.html >> >> There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! >> The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? >> >> I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? >> >> Thanks >> >> 4327 address generate_compare_long_string_different_encoding(bool isLU) { >> 4377 if (SoftwarePrefetchHintDistance >= 0) { >> 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); >> 4379 __ br(__ LT, NO_PREFETCH); >> 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop >> ... ... >> 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? >> 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); >> 4397 } // end of 64-characters loop >> >> 4616 address generate_compare_long_string_same_encoding(bool isLL) { >> 4637 if (SoftwarePrefetchHintDistance >= 0) { >> 4638 __ bind(LARGE_LOOP_PREFETCH); >> 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); >> 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); >> 4641 compare_string_16_bytes_same(DIFF, DIFF2); >> 4642 compare_string_16_bytes_same(DIFF, DIFF2); >> 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); >> 4644 compare_string_16_bytes_same(DIFF, DIFF2); >> 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? >> 4646 compare_string_16_bytes_same(DIFF, DIFF2); >> 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); >> 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left >> 4649 } >> >> Regards >> Patrick >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of Dmitry >> Samersoff >> Sent: Sunday, May 19, 2019 11:42 PM >> To: Dmitrij Pochepko ; Andrew Haley >> ; Pengfei Li (Arm Technology China) >> >> Cc: hotspot-compiler-dev at openjdk.java.net; >> aarch64-port-dev at openjdk.java.net >> Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: >> String::compareTo intrinsic documentation and maintenance improvement >> >> Dmitrij, >> >> The changes looks good to me. >> >> -Dmitry >> >> On 25.02.2019 19:52, Dmitrij Pochepko wrote: >>> Hi Andrew, Pengfei, >>> >>> I created webrev.02 with all your suggestions implemented: >>> >>> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >>> >>> - comments are now both in separate section and inlined into code. >>> - documentation mismatch mentioned by Pengfei is fixed: >>> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >>> -- SHORT_LOOP_TAIL block now merged with last instruction. >>> Documentation is updated respectively >>> - minor other changes to layout and wording >>> >>> Newly developed tests were run as sanity and they passed. >>> >>> Thanks, >>> Dmitrij >>> >>> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>>> >>>>> So personally, I still prefer to inline the comments with the >>>>> original code block to avoid this kind of inconsistencies. And it >>>>> makes us easier to review or maintain the code together with the >>>>> doc, as we don't need to scroll back and force. I don't know the >>>>> benefit of making the code documentation as a separate part. What's >>>>> your opinion, Andrew Haley? >>>> I agree with you. There's no harm having both inline and separate. >>>> From ci_notify at linaro.org Tue Nov 19 00:52:45 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Tue, 19 Nov 2019 00:52:45 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <1998756915.1196.1574124766267.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/322/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/14 pass: 5,753 Build 1: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 2: aarch64/2019/oct/18 pass: 5,760 Build 3: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 4: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 5: aarch64/2019/oct/28 pass: 5,766 Build 6: aarch64/2019/oct/30 pass: 5,768 Build 7: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 8: aarch64/2019/nov/04 pass: 5,769 Build 9: aarch64/2019/nov/06 pass: 5,766; fail: 2 Build 10: aarch64/2019/nov/08 pass: 5,761 Build 11: aarch64/2019/nov/11 pass: 5,762 Build 12: aarch64/2019/nov/13 pass: 5,764; fail: 1 Build 13: aarch64/2019/nov/15 pass: 5,750 Build 14: aarch64/2019/nov/18 pass: 5,750; fail: 1 1 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/14 pass: 8,706; fail: 497; error: 20 Build 1: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 2: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 3: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 4: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 5: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 6: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 7: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 8: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 Build 9: aarch64/2019/nov/06 pass: 8,775; fail: 507; error: 19 Build 10: aarch64/2019/nov/08 pass: 8,774; fail: 510; error: 17 Build 11: aarch64/2019/nov/11 pass: 8,777; fail: 509; error: 15 Build 12: aarch64/2019/nov/13 pass: 8,773; fail: 509; error: 21 Build 13: aarch64/2019/nov/15 pass: 8,756; fail: 511; error: 19 Build 14: aarch64/2019/nov/18 pass: 8,765; fail: 504; error: 18 2 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/14 pass: 3,979 Build 1: aarch64/2019/oct/16 pass: 3,979 Build 2: aarch64/2019/oct/18 pass: 3,979 Build 3: aarch64/2019/oct/21 pass: 3,979 Build 4: aarch64/2019/oct/23 pass: 3,980 Build 5: aarch64/2019/oct/28 pass: 3,980 Build 6: aarch64/2019/oct/30 pass: 3,980 Build 7: aarch64/2019/nov/01 pass: 3,980 Build 8: aarch64/2019/nov/04 pass: 3,980 Build 9: aarch64/2019/nov/06 pass: 3,980 Build 10: aarch64/2019/nov/08 pass: 3,980 Build 11: aarch64/2019/nov/11 pass: 3,980 Build 12: aarch64/2019/nov/13 pass: 3,980 Build 13: aarch64/2019/nov/15 pass: 3,981 Build 14: aarch64/2019/nov/18 pass: 3,981 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.63x Relative performance: Server critical-jOPS (nc): 9.74x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 207.57 Server 207.57 / Server 2014-04-01 (71.00): 2.92x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-10-12 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/284/results/ 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ 2019-11-07 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/310/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/315/results/ 2019-11-14 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/317/results/ 2019-11-16 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/319/results/ 2019-11-19 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/322/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From Pengfei.Li at arm.com Tue Nov 19 10:03:50 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 19 Nov 2019 10:03:50 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> Message-ID: Hi Andrew, > Why do you think so? UseCompressedOops doesn't usually need r27. If I understand correctly, your point is to allocate r27 as well for some scenarios when UseCompressedOops or UseCompressedClassPointers is on. This optimization is much more aggressive and I will try to do it carefully. > We should have a flag which is set if the search for nicely-aligned > memory is successful, and then you can use that flag to determine if r27 is needed. In which file do you think we should add the flag? Can we just check the value of CompressedKlassPointers::base() in reg_mask_init() ? -- Thanks, Pengfei From matthias.baesken at sap.com Wed Nov 20 08:26:17 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Wed, 20 Nov 2019 08:26:17 +0000 Subject: [aarch64-port-dev ] runtime/memory/ReadFromNoaccessArea.java crashes on aarch64 Message-ID: Hello, are you aware that the jtreg hotspot test runtime/memory/ReadFromNoaccessArea.java crashes on aarch64 for some days ? We notice the crash since 15. November . The stderr output is like this : stdout: [# # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (vtableStubs.cpp:197), pid=6213, tid=6217 # guarantee(masm->pc() <= s->code_end()) failed: itable #2: overflowed buffer, estimated len: 176, actual len: 180, overrun: 4 # # JRE version: OpenJDK Runtime Environment (14.0.0.1) (build 14.0.0.1-internal+0-adhoc.openjdk.jdk) # Java VM: OpenJDK 64-Bit Server VM (14.0.0.1-internal+0-adhoc.openjdk.jdk, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0xdcd534] VtableStubs::bookkeeping(MacroAssembler*, outputStream*, VtableStub*, unsigned char*, unsigned char*, bool, int, int, int)+0x114 # # CreateCoredumpOnCrash turned off, no core file dumped # # An error report file with more information is saved as: # /mytestdir/jtreg_hotspot_tier1_work/JTwork/runtime/memory/ReadFromNoaccessArea/hs_err_pid6213.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # ]; stderr: [] exitValue = 1 java.lang.RuntimeException: 'SIGSEGV' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:187) at ReadFromNoaccessArea.main(ReadFromNoaccessArea.java:74) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:833) JavaTest Message: Test threw exception: java.lang.RuntimeException: 'SIGSEGV' missing from stdout/stderr JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.RuntimeException: 'SIGSEGV' missing from stdout/stderr Thread info from hs_err --------------- T H R E A D --------------- Current thread (0x0000ffffa4028800): JavaThread "main" [_thread_in_vm, id=6217, stack(0x0000ffffa9f3b000,0x0000ffffaa13b000)] Stack: [0x0000ffffa9f3b000,0x0000ffffaa13b000], sp=0x0000ffffaa137f40, free space=2035k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xdcd534] VtableStubs::bookkeeping(MacroAssembler*, outputStream*, VtableStub*, unsigned char*, unsigned char*, bool, int, int, int)+0x114 V [libjvm.so+0xdce1fc] VtableStubs::create_itable_stub(int)+0x584 V [libjvm.so+0xdccbd0] VtableStubs::find_stub(bool, int)+0x1f0 V [libjvm.so+0x4e165c] CompiledIC::set_to_megamorphic(CallInfo*, Bytecodes::Code, bool&, Thread*)+0x74 V [libjvm.so+0xb62358] SharedRuntime::handle_ic_miss_helper_internal(Handle, CompiledMethod*, frame const&, methodHandle, Bytecodes::Code, CallInfo&, bool&, Thread*)+0x1e0 V [libjvm.so+0xb63144] SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*)+0x6c4 V [libjvm.so+0xb63380] SharedRuntime::handle_wrong_method_ic_miss(JavaThread*)+0x38 v ~RuntimeStub::ic_miss_stub j jdk.internal.module.ModuleBootstrap.boot()Ljava/lang/ModuleLayer;+1323 java.base at 14.0.0.1-internal j java.lang.System.initPhase2(ZZ)I+0 java.base at 14.0.0.1-internal v ~StubRoutines::call_stub V [libjvm.so+0x712b60] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x368 V [libjvm.so+0x711460] JavaCalls::call_static(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0xf8 V [libjvm.so+0xd5a2dc] Threads::create_vm(JavaVMInitArgs*, bool*)+0x8f4 V [libjvm.so+0x7a23b0] JNI_CreateJavaVM+0x78 C [libjli.so+0x48d0] JavaMain+0x70 C [libjli.so+0x885c] ThreadJavaMain+0xc C [libpthread.so.0+0x7060] start_thread+0xb0 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::ic_miss_stub J 41 c1 jdk.internal.module.ModuleBootstrap$2.hasNext()Z java.base at 14.0.0.1-internal (30 bytes) @ 0x0000ffff8cbf12e0 [0x0000ffff8cbf1080+0x0000000000000260] j java.lang.Module.implAddOpensToAllUnnamed(Ljava/util/Iterator;)V+47 java.base at 14.0.0.1-internal j java.lang.System$2.addOpensToAllUnnamed(Ljava/lang/Module;Ljava/util/Iterator;)V+2 java.base at 14.0.0.1-internal j jdk.internal.module.ModuleBootstrap.addIllegalAccess(Ljava/lang/module/ModuleFinder;Ljava/util/Map;Ljava/util/Map;Ljava/lang/ModuleLayer;Z)V+573 java.base at 14.0.0.1-internal j jdk.internal.module.ModuleBootstrap.boot()Ljava/lang/ModuleLayer;+1323 java.base at 14.0.0.1-internal j java.lang.System.initPhase2(ZZ)I+0 java.base at 14.0.0.1-internal v ~StubRoutines::call_stub Best regards, Matthias From ci_notify at linaro.org Thu Nov 21 03:05:25 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Thu, 21 Nov 2019 03:05:25 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <1620776588.1768.1574305526541.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/324/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/16 pass: 5,753; fail: 1 Build 1: aarch64/2019/oct/18 pass: 5,760 Build 2: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 3: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 4: aarch64/2019/oct/28 pass: 5,766 Build 5: aarch64/2019/oct/30 pass: 5,768 Build 6: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 7: aarch64/2019/nov/04 pass: 5,769 Build 8: aarch64/2019/nov/06 pass: 5,766; fail: 2 Build 9: aarch64/2019/nov/08 pass: 5,761 Build 10: aarch64/2019/nov/11 pass: 5,762 Build 11: aarch64/2019/nov/13 pass: 5,764; fail: 1 Build 12: aarch64/2019/nov/15 pass: 5,750 Build 13: aarch64/2019/nov/18 pass: 5,750; fail: 1 Build 14: aarch64/2019/nov/20 pass: 5,752 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/16 pass: 8,702; fail: 509; error: 17 Build 1: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 2: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 3: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 4: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 5: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 6: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 7: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 Build 8: aarch64/2019/nov/06 pass: 8,775; fail: 507; error: 19 Build 9: aarch64/2019/nov/08 pass: 8,774; fail: 510; error: 17 Build 10: aarch64/2019/nov/11 pass: 8,777; fail: 509; error: 15 Build 11: aarch64/2019/nov/13 pass: 8,773; fail: 509; error: 21 Build 12: aarch64/2019/nov/15 pass: 8,756; fail: 511; error: 19 Build 13: aarch64/2019/nov/18 pass: 8,765; fail: 504; error: 18 Build 14: aarch64/2019/nov/20 pass: 8,768; fail: 504; error: 19 2 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/16 pass: 3,979 Build 1: aarch64/2019/oct/18 pass: 3,979 Build 2: aarch64/2019/oct/21 pass: 3,979 Build 3: aarch64/2019/oct/23 pass: 3,980 Build 4: aarch64/2019/oct/28 pass: 3,980 Build 5: aarch64/2019/oct/30 pass: 3,980 Build 6: aarch64/2019/nov/01 pass: 3,980 Build 7: aarch64/2019/nov/04 pass: 3,980 Build 8: aarch64/2019/nov/06 pass: 3,980 Build 9: aarch64/2019/nov/08 pass: 3,980 Build 10: aarch64/2019/nov/11 pass: 3,980 Build 11: aarch64/2019/nov/13 pass: 3,980 Build 12: aarch64/2019/nov/15 pass: 3,981 Build 13: aarch64/2019/nov/18 pass: 3,981 Build 14: aarch64/2019/nov/20 pass: 3,981 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 8.14x Relative performance: Server critical-jOPS (nc): 9.52x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 207.57 Server 207.57 / Server 2014-04-01 (71.00): 2.92x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-10-15 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/287/results/ 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ 2019-11-07 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/310/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/315/results/ 2019-11-14 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/317/results/ 2019-11-16 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/319/results/ 2019-11-19 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/322/results/ 2019-11-21 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/324/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From Xiaohong.Gong at arm.com Thu Nov 21 08:28:16 2019 From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China)) Date: Thu, 21 Nov 2019 08:28:16 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port Message-ID: Hi Felix, I met the similar SIGILL on aarch64 platform as well. And here is the related JBS containing the resolution: https://bugs.openjdk.java.net/browse/JDK-8234321 Hope this could help you! Thanks, Xiaohong Gong From aph at redhat.com Thu Nov 21 10:03:58 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Nov 2019 10:03:58 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: References: Message-ID: <9ce8d22e-875e-ca04-8ca0-ee7d269e4754@redhat.com> On 11/15/19 8:33 AM, Yangfei (Felix) wrote: > I am witnessing some SIGILL jvm crashes on my aarch64 platform. > I looked at the ISB usage, especially this one: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2014-September/001376.html > One of changes is adding one ISB after the native call returns: Yes, I did that. > 1100 static void rt_call(MacroAssembler* masm, address dest, int gpargs, int fpargs, int type) { > 1101 CodeBlob *cb = CodeCache::find_blob(dest); > 1102 if (cb) { > 1103 __ far_call(RuntimeAddress(dest)); > 1104 } else { > 1105 assert((unsigned)gpargs < 256, "eek!"); > 1106 assert((unsigned)fpargs < 32, "eek!"); > 1107 __ lea(rscratch1, RuntimeAddress(dest)); > 1108 __ blr(rscratch1); > 1109 __ maybe_isb(); < ======== > 1110 } > 1111 } > > > The rt_call function is used in generate_native_wrapper to make the > JNI call. > As I didn?t see the barrier for the ppc or arm port. I would like > to know more details here. Does anyone still remember? What question are you asking? The ISB is there because the callout might run concurrently with a safepoint, during which time the code cache may be changed by some other thread. While we are in native code safepoints can run in other threads without us knowing. > Also the ISB is planted only in the else block. I assume this is > also necessary for the if block. Correct? No. The if block is for calls within the AArch64 Java runtime, so we stay in Java, and there shouldn't be any ISB needed. Any part of the Java runtime that loads or generates code does its own ISB. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Thu Nov 21 11:47:05 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 21 Nov 2019 11:47:05 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: <9ce8d22e-875e-ca04-8ca0-ee7d269e4754@redhat.com> References: <9ce8d22e-875e-ca04-8ca0-ee7d269e4754@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Thursday, November 21, 2019 6:04 PM > To: Yangfei (Felix) ; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] Question about ISB usage in the aarch64 port > > On 11/15/19 8:33 AM, Yangfei (Felix) wrote: > > > > The rt_call function is used in generate_native_wrapper to make the > > JNI call. > > As I didn?t see the barrier for the ppc or arm port. I would like > > to know more details here. Does anyone still remember? > > What question are you asking? The ISB is there because the callout might run > concurrently with a safepoint, during which time the code cache may be > changed by some other thread. While we are in native code safepoints can run > in other threads without us knowing. I didn't find this barrier for the ppc or arm port. My question: is this necessary to plant a instruction barrier in the same place for those ports? Please let me know if I missed anything here. > > > Also the ISB is planted only in the else block. I assume this is > > also necessary for the if block. Correct? > > No. The if block is for calls within the AArch64 Java runtime, so we stay in Java, > and there shouldn't be any ISB needed. Any part of the Java runtime that loads > or generates code does its own ISB. I see. Does that mean the isb in LIR_Assembler::rt_call should be planted in the else block? I mean: diff -r dc45ed0ab083 src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Wed Nov 13 15:16:45 2019 -0800 +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Thu Nov 21 19:25:00 2019 +0800 @@ -2906,12 +2906,12 @@ } else { __ mov(rscratch1, RuntimeAddress(dest)); __ blr(rscratch1); + __ maybe_isb(); } if (info != NULL) { add_call_info_here(info); } - __ maybe_isb(); } Thanks, Felix From aph at redhat.com Thu Nov 21 14:04:07 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Nov 2019 14:04:07 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> Message-ID: <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com> On 11/19/19 10:03 AM, Pengfei Li (Arm Technology China) wrote: >> We should have a flag which is set if the search for nicely-aligned >> memory is successful, and then you can use that flag to determine if r27 is needed. > In which file do you think we should add the flag? Can we just check the value of CompressedKlassPointers::base() in reg_mask_init() ? I would call from the #ifdef AARCH64 code that allocates the memory into a static method Assembler::setCompressedBaseAndScale(). -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Nov 21 14:09:52 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Nov 2019 14:09:52 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: References: <9ce8d22e-875e-ca04-8ca0-ee7d269e4754@redhat.com> Message-ID: <931032a7-5548-f847-e271-df5e4acf81ed@redhat.com> On 11/21/19 11:47 AM, Yangfei (Felix) wrote: >> -----Original Message----- >> From: Andrew Haley [mailto:aph at redhat.com] >> Sent: Thursday, November 21, 2019 6:04 PM >> To: Yangfei (Felix) ; >> aarch64-port-dev at openjdk.java.net >> Subject: Re: [aarch64-port-dev ] Question about ISB usage in the aarch64 port >> >> On 11/15/19 8:33 AM, Yangfei (Felix) wrote: >>> >>> The rt_call function is used in generate_native_wrapper to make the >>> JNI call. >>> As I didn?t see the barrier for the ppc or arm port. I would like >>> to know more details here. Does anyone still remember? >> >> What question are you asking? The ISB is there because the callout might run >> concurrently with a safepoint, during which time the code cache may be >> changed by some other thread. While we are in native code safepoints can run >> in other threads without us knowing. > > I didn't find this barrier for the ppc or arm port. > My question: is this necessary to plant a instruction barrier in the same place for those ports? Probably. There was recently some code discussed to do pipeline flushing for for x86 and others. I can't find it right now... >>> Also the ISB is planted only in the else block. I assume this is >>> also necessary for the if block. Correct? >> >> No. The if block is for calls within the AArch64 Java runtime, so >> we stay in Java, and there shouldn't be any ISB needed. Any part of >> the Java runtime that loads or generates code does its own ISB. > > I see. Does that mean the isb in LIR_Assembler::rt_call should be planted in the else block? > I mean: > > diff -r dc45ed0ab083 src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Wed Nov 13 15:16:45 2019 -0800 > +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Thu Nov 21 19:25:00 2019 +0800 > @@ -2906,12 +2906,12 @@ > } else { > __ mov(rscratch1, RuntimeAddress(dest)); > __ blr(rscratch1); > + __ maybe_isb(); > } > > if (info != NULL) { > add_call_info_here(info); > } > - __ maybe_isb(); > } Quite possibly, but I wouldn't touch it without a very good reason and lots of analysis. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Nov 21 15:33:04 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Nov 2019 15:33:04 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: <931032a7-5548-f847-e271-df5e4acf81ed@redhat.com> References: <9ce8d22e-875e-ca04-8ca0-ee7d269e4754@redhat.com> <931032a7-5548-f847-e271-df5e4acf81ed@redhat.com> Message-ID: <3c756be2-4b0e-e4a4-e356-9fa673e97f81@redhat.com> On 11/21/19 2:09 PM, Andrew Haley wrote: > Quite possibly, but I wouldn't touch it without a very good reason and > lots of analysis. And I have to confess that there are probably unnecessary ISBs. It's what we in England call "belt and braces", or more politely "defence in depth". -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ci_notify at linaro.org Fri Nov 22 02:31:03 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Fri, 22 Nov 2019 02:31:03 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK 11u on AArch64 Message-ID: <1268109911.1912.1574389864399.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/summary/2019/325/summary.html ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/aug/03 pass: 5,746; fail: 4 Build 1: aarch64/2019/aug/10 pass: 5,747; fail: 4 Build 2: aarch64/2019/aug/15 pass: 5,753; fail: 4 Build 3: aarch64/2019/aug/22 pass: 5,755; fail: 4 Build 4: aarch64/2019/sep/04 pass: 5,764; fail: 2 Build 5: aarch64/2019/sep/05 pass: 5,764; fail: 2 Build 6: aarch64/2019/sep/10 pass: 5,764; fail: 2 Build 7: aarch64/2019/sep/17 pass: 5,763; fail: 3 Build 8: aarch64/2019/sep/21 pass: 5,764; fail: 2 Build 9: aarch64/2019/oct/04 pass: 5,764; fail: 2 Build 10: aarch64/2019/oct/17 pass: 5,764; fail: 2 Build 11: aarch64/2019/oct/31 pass: 5,784; fail: 1 Build 12: aarch64/2019/nov/09 pass: 5,773; fail: 3 Build 13: aarch64/2019/nov/16 pass: 5,775; fail: 1 Build 14: aarch64/2019/nov/21 pass: 5,775; fail: 1 ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/aug/03 pass: 8,429; fail: 509; error: 18 Build 1: aarch64/2019/aug/10 pass: 8,450; fail: 485; error: 16 Build 2: aarch64/2019/aug/15 pass: 8,443; fail: 496; error: 13 Build 3: aarch64/2019/aug/22 pass: 8,446; fail: 494; error: 15 Build 4: aarch64/2019/sep/04 pass: 8,483; fail: 465; error: 10 Build 5: aarch64/2019/sep/05 pass: 8,465; fail: 479; error: 14 Build 6: aarch64/2019/sep/10 pass: 8,444; fail: 500; error: 14 Build 7: aarch64/2019/sep/17 pass: 8,462; fail: 482; error: 12 Build 8: aarch64/2019/sep/21 pass: 8,467; fail: 478; error: 13 Build 9: aarch64/2019/oct/04 pass: 8,444; fail: 498; error: 16 Build 10: aarch64/2019/oct/17 pass: 8,452; fail: 493; error: 16 Build 11: aarch64/2019/oct/31 pass: 8,468; fail: 490; error: 14 Build 12: aarch64/2019/nov/09 pass: 8,487; fail: 470; error: 16 Build 13: aarch64/2019/nov/16 pass: 8,475; fail: 484; error: 15 Build 14: aarch64/2019/nov/21 pass: 8,489; fail: 497; error: 13 4 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/aug/03 pass: 3,908 Build 1: aarch64/2019/aug/10 pass: 3,909 Build 2: aarch64/2019/aug/15 pass: 3,909 Build 3: aarch64/2019/aug/22 pass: 3,909 Build 4: aarch64/2019/sep/04 pass: 3,910 Build 5: aarch64/2019/sep/05 pass: 3,910 Build 6: aarch64/2019/sep/10 pass: 3,910 Build 7: aarch64/2019/sep/17 pass: 3,910 Build 8: aarch64/2019/sep/21 pass: 3,910 Build 9: aarch64/2019/oct/04 pass: 3,910 Build 10: aarch64/2019/oct/17 pass: 3,910 Build 11: aarch64/2019/oct/31 pass: 3,910 Build 12: aarch64/2019/nov/09 pass: 3,910 Build 13: aarch64/2019/nov/16 pass: 3,910 Build 14: aarch64/2019/nov/21 pass: 3,910 Previous results can be found here: http://openjdk.linaro.org/jdk11u/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 7.38x Relative performance: Server critical-jOPS (nc): 8.20x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 210.67 Server 210.67 / Server 2014-04-01 (71.00): 2.97x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdk11u/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-08-04 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/215/results/ 2019-08-11 pass rate: 10488/10488, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/222/results/ 2019-08-16 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/227/results/ 2019-08-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/234/results/ 2019-09-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/247/results/ 2019-09-07 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/248/results/ 2019-09-11 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/253/results/ 2019-09-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/260/results/ 2019-09-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/264/results/ 2019-10-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/277/results/ 2019-10-18 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/290/results/ 2019-11-01 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/304/results/ 2019-11-10 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/313/results/ 2019-11-17 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/320/results/ 2019-11-22 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/2019/325/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdk11u/jcstress-nightly-runs/ From Pengfei.Li at arm.com Fri Nov 22 08:45:47 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 22 Nov 2019 08:45:47 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com> Message-ID: Hi Andrew, > > In which file do you think we should add the flag? Can we just check the > value of CompressedKlassPointers::base() in reg_mask_init() ? > > I would call from the #ifdef AARCH64 code that allocates the memory into a > static method Assembler::setCompressedBaseAndScale(). Thanks for your suggestion. I have ever tried to set a flag from the metaspace reservation code but now I'm switching back to my another approach. Below is my justification. The #ifdef code block which allocates metaspace is actually used by both AARCH64 and AIX. Of course, we can add AArch64-specific logic inside with AARCH64_ONLY(), but it doesn't cover all scenarios that r27 isn't used. In klass pointers encoding and decoding, we have a special path called use_XOR_for_compressed_class_base where the metaspace may be not nicely fit but r27 isn't used. [1] Regarding your suggestion of setting compressed base and shift values into AArch64 assembler, it can solve the problem of covering the use_XOR_for_compressed_class_base path. But we have to do it in Metaspace::set_narrow_klass_base_and_shift() where the base and shift are finally determined and introduce new code block of "#ifdef AARCH64 #endif" in HotSpot shared code. In my approach, I added a method in aarch64.ad to check the base and shift in reg_mask_init(), and moved the logic of use_XOR_for_compressed_class_base here from the MacroAssembler constructor. I know my implementation has a drawback that the logic of my new method may be mis-aligned with the encoding/decoding logic if someone changes the MacroAssembler code without noticing my code. So I also added a few lines of comments to avoid this happening. See my updated webrev below. http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.01/ Please let me know if you have any further suggestions or disagreements. [1] http://hg.openjdk.java.net/jdk/jdk/file/fcd74557a9cc/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#l3918 -- Thanks, Pengfei From ci_notify at linaro.org Sat Nov 23 01:36:11 2019 From: ci_notify at linaro.org (ci_notify at linaro.org) Date: Sat, 23 Nov 2019 01:36:11 +0000 (UTC) Subject: [aarch64-port-dev ] JTREG, JCStress, SPECjbb2015 and Hadoop/Terasort results for OpenJDK JDK on AArch64 Message-ID: <1328467443.2022.1574472971999.JavaMail.javamailuser@localhost> This is a summary of the JTREG test results =========================================== The build and test results are cycled every 15 days. For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/summary/2019/326/summary.html ------------------------------------------------------------------------------- client-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,780; fail: 19; not run: 90 ------------------------------------------------------------------------------- client-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,495; fail: 670; error: 23 ------------------------------------------------------------------------------- client-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 ------------------------------------------------------------------------------- release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/18 pass: 5,760 Build 1: aarch64/2019/oct/21 pass: 5,716; fail: 43; error: 1 Build 2: aarch64/2019/oct/23 pass: 5,760; fail: 1 Build 3: aarch64/2019/oct/28 pass: 5,766 Build 4: aarch64/2019/oct/30 pass: 5,768 Build 5: aarch64/2019/nov/01 pass: 5,768; fail: 1 Build 6: aarch64/2019/nov/04 pass: 5,769 Build 7: aarch64/2019/nov/06 pass: 5,766; fail: 2 Build 8: aarch64/2019/nov/08 pass: 5,761 Build 9: aarch64/2019/nov/11 pass: 5,762 Build 10: aarch64/2019/nov/13 pass: 5,764; fail: 1 Build 11: aarch64/2019/nov/15 pass: 5,750 Build 12: aarch64/2019/nov/18 pass: 5,750; fail: 1 Build 13: aarch64/2019/nov/20 pass: 5,752 Build 14: aarch64/2019/nov/22 pass: 5,752; fail: 1 1 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/18 pass: 8,694; fail: 522; error: 17 Build 1: aarch64/2019/oct/21 pass: 8,705; fail: 512; error: 18 Build 2: aarch64/2019/oct/23 pass: 8,712; fail: 505; error: 18 Build 3: aarch64/2019/oct/28 pass: 8,711; fail: 509; error: 18 Build 4: aarch64/2019/oct/30 pass: 8,723; fail: 504; error: 19 Build 5: aarch64/2019/nov/01 pass: 8,774; fail: 506; error: 18 Build 6: aarch64/2019/nov/04 pass: 8,777; fail: 509; error: 17 Build 7: aarch64/2019/nov/06 pass: 8,775; fail: 507; error: 19 Build 8: aarch64/2019/nov/08 pass: 8,774; fail: 510; error: 17 Build 9: aarch64/2019/nov/11 pass: 8,777; fail: 509; error: 15 Build 10: aarch64/2019/nov/13 pass: 8,773; fail: 509; error: 21 Build 11: aarch64/2019/nov/15 pass: 8,756; fail: 511; error: 19 Build 12: aarch64/2019/nov/18 pass: 8,765; fail: 504; error: 18 Build 13: aarch64/2019/nov/20 pass: 8,768; fail: 504; error: 19 Build 14: aarch64/2019/nov/22 pass: 8,769; fail: 507; error: 18 3 fatal errors were detected; please follow the link above for more detail. ------------------------------------------------------------------------------- release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2019/oct/18 pass: 3,979 Build 1: aarch64/2019/oct/21 pass: 3,979 Build 2: aarch64/2019/oct/23 pass: 3,980 Build 3: aarch64/2019/oct/28 pass: 3,980 Build 4: aarch64/2019/oct/30 pass: 3,980 Build 5: aarch64/2019/nov/01 pass: 3,980 Build 6: aarch64/2019/nov/04 pass: 3,980 Build 7: aarch64/2019/nov/06 pass: 3,980 Build 8: aarch64/2019/nov/08 pass: 3,980 Build 9: aarch64/2019/nov/11 pass: 3,980 Build 10: aarch64/2019/nov/13 pass: 3,980 Build 11: aarch64/2019/nov/15 pass: 3,981 Build 12: aarch64/2019/nov/18 pass: 3,981 Build 13: aarch64/2019/nov/20 pass: 3,981 Build 14: aarch64/2019/nov/22 pass: 3,981 ------------------------------------------------------------------------------- server-release/hotspot ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 5,787; fail: 18; not run: 90 ------------------------------------------------------------------------------- server-release/jdk ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 8,476; fail: 686; error: 27 ------------------------------------------------------------------------------- server-release/langtools ------------------------------------------------------------------------------- Build 0: aarch64/2018/oct/15 pass: 3,970; fail: 5 Previous results can be found here: http://openjdk.linaro.org/jdkX/openjdk-jtreg-nightly-tests/index.html SPECjbb2015 composite regression test completed =============================================== This test measures the relative performance of the server compiler running the SPECjbb2015 composite tests and compares the performance against the baseline performance of the server compiler taken on 2016-11-21. In accordance with [1], the SPECjbb2015 tests are run on a system which is not production ready and does not meet all the requirements for publishing compliant results. The numbers below shall be treated as non-compliant (nc) and are for experimental purposes only. Relative performance: Server max-jOPS (nc): 8.24x Relative performance: Server critical-jOPS (nc): 9.93x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/SPECjbb2015-results/ [1] http://www.spec.org/fairuse.html#Academic Regression test Hadoop-Terasort completed ========================================= This test measures the performance of the server and client compilers running Hadoop sorting a 1GB file using Terasort and compares the performance against the baseline performance of the Zero interpreter and against the baseline performance of the server compiler on 2014-04-01. Relative performance: Zero: 1.0, Server: 213.86 Server 213.86 / Server 2014-04-01 (71.00): 3.01x Details of the test setup and historical results may be found here: http://openjdk.linaro.org/jdkX/hadoop-terasort-benchmark-results/ This is a summary of the jcstress test results ============================================== The build and test results are cycled every 15 days. 2019-10-17 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/289/results/ 2019-10-19 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/291/results/ 2019-10-22 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/294/results/ 2019-10-23 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/296/results/ 2019-10-29 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/301/results/ 2019-10-31 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/303/results/ 2019-11-02 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/305/results/ 2019-11-05 pass rate: 10489/10489, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/308/results/ 2019-11-07 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/310/results/ 2019-11-12 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/315/results/ 2019-11-14 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/317/results/ 2019-11-16 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/319/results/ 2019-11-19 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/322/results/ 2019-11-21 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/324/results/ 2019-11-23 pass rate: 10490/10490, results: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/2019/326/results/ For detailed information on the test output please refer to: http://openjdk.linaro.org/jdkX/jcstress-nightly-runs/ From felix.yang at huawei.com Mon Nov 25 11:33:18 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 25 Nov 2019 11:33:18 +0000 Subject: [aarch64-port-dev ] RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when profiling return and parameters type Message-ID: Ping? Any comments? Thanks, Felix From: Yangfei (Felix) Sent: Thursday, November 7, 2019 9:17 AM To: hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when profiling return and parameters type Hi, Please review the following patch: Bug: https://bugs.openjdk.java.net/browse/JDK-8233466 Webrev: http://cr.openjdk.java.net/~fyang/8233466/webrev.00/ When profiling return and parameters type from the interpreter on aarch64 platform, 'mdp' is loaded by test_method_data_pointer which is called by profile_return_type & profile_parameters_type. It's not necessary to load mdo before calling __ profile_return_type or __ profile_parameters_type. Passed tier1-3 testing. Thanks, Felix From nick.gasson at arm.com Tue Nov 26 09:02:46 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Tue, 26 Nov 2019 17:02:46 +0800 Subject: [aarch64-port-dev ] runtime/memory/ReadFromNoaccessArea.java crashes on aarch64 In-Reply-To: References: Message-ID: <19d990fc-7b62-bba2-da95-9abedfb13d37@arm.com> Hi Matthias, > Hello, are you aware that the jtreg hotspot test runtime/memory/ReadFromNoaccessArea.java crashes on aarch64 for some days ? > We notice the crash since 15. November . Thanks, I've made a Jira ticket to track this: https://bugs.openjdk.java.net/browse/JDK-8234794 It's been failing since the fix for JDK-8231610. Nick From nick.gasson at arm.com Tue Nov 26 09:25:03 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Tue, 26 Nov 2019 17:25:03 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: Hi Andrew, > > I see > > if (use_XOR_for_compressed_class_base) { > if (CompressedKlassPointers::shift() != 0) { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > lsr(dst, dst, LogKlassAlignmentInBytes); > } else { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > } > return; > } > > if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 > && CompressedKlassPointers::shift() == 0) { > movw(dst, src); > return; > } > > ... followed by code which does use r27. > > Do you ever see r27 being used? If so, I'd be interested to know how > this gets triggered and what command-line arguments you use. It's > rather inefficient. > Oddly enough the test case runtime/memory/ReadFromNoaccessArea.java now hits this. I see: CompressedKlassPointers::base() => 0xffff0b4b5000 CompressedKlassPointers::shift() => 3 The itable stub calls MacroAssembler::load_klass() twice which then calls the above decode_klass_not_null() with dst==src if UseCompressedClassPointers is true. So we do the saving/restoring rheapbase dance twice which blows up the size of the itable stub beyond the estimated 152B max size. The key is that this test passes -XX:HeapBaseMinAddress=33G. That in conjunction with the recent changes to where the CDS archive is loaded hits this code path (I don't see this with -Xshare:off). Thanks, Nick From nick.gasson at arm.com Tue Nov 26 10:34:32 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Tue, 26 Nov 2019 18:34:32 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> (Not related to the original RFR.) > > The itable stub calls MacroAssembler::load_klass() twice which then > calls the above decode_klass_not_null() with dst==src if > UseCompressedClassPointers is true. So we do the saving/restoring > rheapbase dance twice which blows up the size of the itable stub beyond > the estimated 152B max size. > Actually I don't think we need to call load_klass twice on AArch64? The compiled code doesn't use callee save registers so we should have plenty spare to use as temporaries. I.e. could we do the following? --- a/src/hotspot/cpu/aarch64/vtableStubs_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/vtableStubs_aarch64.cpp @@ -175,6 +175,7 @@ VtableStub* VtableStubs::create_itable_stub(int itable_index) { const Register holder_klass_reg = r16; // declaring interface klass (DECC) const Register resolved_klass_reg = rmethod; // resolved interface klass (REFC) const Register temp_reg = r11; + const Register temp_reg2 = r15; const Register icholder_reg = rscratch2; Label L_no_such_interface; @@ -193,7 +194,7 @@ VtableStub* VtableStubs::create_itable_stub(int itable_index) { __ lookup_interface_method(// inputs: rec. class, interface recv_klass_reg, resolved_klass_reg, noreg, // outputs: scan temp. reg1, scan temp. reg2 - recv_klass_reg, temp_reg, + temp_reg2, temp_reg, L_no_such_interface, /*return_method=*/false); @@ -201,7 +202,7 @@ VtableStub* VtableStubs::create_itable_stub(int itable_index) { start_pc = __ pc(); // Get selected method from declaring class and itable index - __ load_klass(recv_klass_reg, j_rarg0); // restore recv_klass_reg + //__ load_klass(recv_klass_reg, j_rarg0); // restore recv_klass_reg __ lookup_interface_method(// inputs: rec. class, interface, itable index recv_klass_reg, holder_klass_reg, itable_index, // outputs: method, scan temp. reg Thanks, Nick From gnu.andrew at redhat.com Wed Nov 27 05:31:21 2019 From: gnu.andrew at redhat.com (Andrew John Hughes) Date: Wed, 27 Nov 2019 05:31:21 +0000 Subject: [aarch64-port-dev ] [RFR] [8u] 8u242-b01 Upstream Sync Message-ID: <646b15f6-a7bc-b5c5-a502-83fb3df9f54d@redhat.com> Webrevs: https://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/ Merge changesets: http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/corba/merge.changeset http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/jaxp/merge.changeset http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/jaxws/merge.changeset http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/jdk/merge.changeset http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/hotspot/merge.changeset http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/langtools/merge.changeset http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/nashorn/merge.changeset http://cr.openjdk.java.net/~andrew/shenandoah-8/u242-b01/root/merge.changeset Changes in aarch64-shenandoah-jdk8u242-b01: - S8010500: [parfait] Possible null pointer dereference at hotspot/src/share/vm/opto/loopnode.hpp - S8067429: java.lang.VerifyError: Inconsistent stackmap frames at branch target - S8073154: NULL-pointer dereferencing in LIR_OpProfileType::print_instr - S8077707: jdk9 b58 cannot run any graphical application on Win 8 with JAWS running - S8132249: Clean up JAB debugging code - S8133951: Zero interpreter asserts in stubRoutines.cpp - S8134739: compiler/loopopts/superword/TestVectorizationWithInvariant crashes in loop opts - S8209835: Aarch64: elide barriers on all volatile operations - S8212071: Need to set the FreeType LCD Filter to reduce fringing. - S8230238: Add another regression test for JDK-8134739 - S8230813: Add JDK-8010500 to compiler/loopopts/superword/TestFuzzPreLoop.java bug list - S8231398: Add time tracing for gc log rotation at safepoint cleanup - S8231988: Unexpected test result caused by C2 IdealLoopTree::do_remove_empty_loop Main issues of note: * 8209835 is already upstream but is part of this tag. * 8073154 change to src/share/vm/c1/c1_LIR.cpp was already included in an earlier form as part of "Implement type profiling in C1." [0]. Merge conflict was resolve to use the 8u upstream version. diffstat for root b/.hgtags | 3 +++ 1 file changed, 3 insertions(+) diffstat for corba b/.hgtags | 3 +++ 1 file changed, 3 insertions(+) diffstat for jaxp b/.hgtags | 3 +++ 1 file changed, 3 insertions(+) diffstat for jaxws b/.hgtags | 3 +++ 1 file changed, 3 insertions(+) diffstat for langtools b/.hgtags | 3 b/src/share/classes/com/sun/tools/javac/jvm/Gen.java | 19 ++- b/test/tools/javac/BranchToFewerDefines.java | 111 +++++++++++++++++++ 3 files changed, 128 insertions(+), 5 deletions(-) diffstat for nashorn b/.hgtags | 3 +++ 1 file changed, 3 insertions(+) diffstat for jdk b/.hgtags | 3 b/src/share/native/sun/font/freetypeScaler.c | 3 b/src/windows/native/sun/bridge/AccessBridgeATInstance.cpp | 2 b/src/windows/native/sun/bridge/AccessBridgeJavaEntryPoints.cpp | 2 b/src/windows/native/sun/bridge/AccessBridgeJavaVMInstance.cpp | 2 b/src/windows/native/sun/bridge/AccessBridgeWindowsEntryPoints.cpp | 1 b/src/windows/native/sun/bridge/JavaAccessBridge.cpp | 51 ++-------- b/src/windows/native/sun/bridge/JavaAccessBridge.h | 2 b/src/windows/native/sun/bridge/WinAccessBridge.cpp | 4 9 files changed, 26 insertions(+), 44 deletions(-) diffstat for hotspot b/.hgtags | 3 b/src/share/vm/c1/c1_LIR.cpp | 8 - b/src/share/vm/opto/loopTransform.cpp | 9 + b/src/share/vm/opto/loopnode.hpp | 1 b/src/share/vm/opto/superword.cpp | 26 +++- b/src/share/vm/runtime/safepoint.cpp | 1 b/src/share/vm/runtime/stubRoutines.cpp | 4 b/test/compiler/loopopts/TestRemoveEmptyLoop.java | 53 +++++++++ b/test/compiler/loopopts/superword/TestFuzzPreLoop.java | 65 +++++++++++ b/test/compiler/print/TestProfileReturnTypePrinting.java | 68 +++++++++++ b/test/runtime/RedefineTests/test8178870.sh | 87 +++++++++++++++ 11 files changed, 312 insertions(+), 13 deletions(-) Successfully built on x86, x86_64, s390, s390x, ppc, ppc64, ppc64le & aarch64. Ok to push? [0] https://hg.openjdk.java.net/aarch64-port/jdk8u-shenandoah/hotspot/rev/050fe4f6976ab67316 Thanks, -- Andrew :) Senior Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 https://keybase.io/gnu_andrew From aph at redhat.com Wed Nov 27 10:54:36 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 27 Nov 2019 10:54:36 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> On 11/26/19 9:25 AM, Nick Gasson wrote: > Oddly enough the test case runtime/memory/ReadFromNoaccessArea.java now > hits this. I see: > > CompressedKlassPointers::base() => 0xffff0b4b5000 > CompressedKlassPointers::shift() => 3 This is bad. Can you have a look at the allocation code to see why the search for an appropriate address range fails? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Nov 27 10:56:59 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 27 Nov 2019 10:56:59 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> Message-ID: <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> On 11/26/19 10:34 AM, Nick Gasson wrote: > Actually I don't think we need to call load_klass twice on AArch64? The > compiled code doesn't use callee save registers so we should have plenty > spare to use as temporaries. I.e. could we do the following? I guess. It'd be nicer to fix CDS on AArch64 so that it doesn't cause performance regressions. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From boris.ulasevich at bell-sw.com Wed Nov 27 12:55:18 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 27 Nov 2019 15:55:18 +0300 Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 Message-ID: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Hi, Please review the fix in aarch64.ad to address the build issue "Ideal node missing: CmpOp" raised after recent change in C2. The intuitive operand name case correction CmpOp->cmpOp fixes the build, but leads to unworkable jvm. Removing the match rule works good and jdk/hotspot tests are Ok. http://bugs.openjdk.java.net/browse/JDK-8234891 http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 ARM32 build fails too. I will fix the problem in arm32.ad file separately. thanks, Boris From vladimir.x.ivanov at oracle.com Wed Nov 27 13:23:57 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 27 Nov 2019 16:23:57 +0300 Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: The fix looks good and trivial. Best regards, Vladimir Ivanov On 27.11.2019 15:55, Boris Ulasevich wrote: > Hi, > > Please review the fix in aarch64.ad to address the build issue "Ideal > node missing: CmpOp" raised after recent change in C2. The intuitive > operand name case correction CmpOp->cmpOp fixes the build, but leads to > unworkable jvm. Removing the match rule works good and jdk/hotspot tests > are Ok. > > http://bugs.openjdk.java.net/browse/JDK-8234891 > http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 > > ARM32 build fails too. I will fix the problem in arm32.ad file separately. > > thanks, > Boris From stuart.monteith at linaro.org Wed Nov 27 16:06:44 2019 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Wed, 27 Nov 2019 16:06:44 +0000 Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: Thanks Boris - looks good to me. Please ask me or my fellow Arm engineers if you should need any help testing in future. On Wed, 27 Nov 2019 at 13:26, Vladimir Ivanov wrote: > > The fix looks good and trivial. > > Best regards, > Vladimir Ivanov > > On 27.11.2019 15:55, Boris Ulasevich wrote: > > Hi, > > > > Please review the fix in aarch64.ad to address the build issue "Ideal > > node missing: CmpOp" raised after recent change in C2. The intuitive > > operand name case correction CmpOp->cmpOp fixes the build, but leads to > > unworkable jvm. Removing the match rule works good and jdk/hotspot tests > > are Ok. > > > > http://bugs.openjdk.java.net/browse/JDK-8234891 > > http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 > > > > ARM32 build fails too. I will fix the problem in arm32.ad file separately. > > > > thanks, > > Boris From patrick at os.amperecomputing.com Thu Nov 28 03:11:48 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Thu, 28 Nov 2019 03:11:48 +0000 Subject: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable Message-ID: Hi Andrew, I collected the timings and did a comparison, please see the spread sheet in [1]. Per the comments from Dmitrij in another thread, and rethought the concerns you and Andrew Dinn reminded, I revised the patch to drop the tunable flags and the extra overprefech checking for LL/UU, then updated the shared STUB_THRESHOLD for UU/LU/UL respectively, according to the experimental data (but note that I only have one aarch64 system, the coverage might be limited). Please review. JBS: https://bugs.openjdk.java.net/browse/JDK-8229351 Webrev: http://cr.openjdk.java.net/~qpzhang/8229351/webrev.05 1. LL is the most common use case, especially the shorter strings, I will not change this as the tests cannot consistently produce a very positive data. So this part is same as what Dmitrij worked on, nothing changed. 2. UU used the same threshold 72 (chars) as LL, which meant 144 bytes for UU and 72 bytes for LL (with -XX:+CompactStrings by default). I updated it from 72 to 36 so the limit is fair to UU now. The test shows (see figure [2]) there is in average ~10% perf gains with [36, 71] characters, other lengths are same. 3. LU/UL, updated the threshold from 72 (chars) to 24 (chars). According to the algorithm in generate_compare_long_string_different_encoding, 24 is the minimum length that can take the advantage of compare_string_16_x_LU function to process 16 chars (32 bytes) in a loop, and can be faster than the outer function which processes every 8 bytes in the main loop. See figure [3], the perf gains are up to 60% (the secondary axis at the right) 4. Added " align(OptoLoopAlignment);" for main loops in the stub code, per early suggestions from Aleksei. 5. Updated the two relevant test cases under test/hotspot/jtreg/compiler/intrinsics/string, with additional string lengths that can better cover the cases referred in this patch. More about the figures in the spread sheet and the JPG files (same): The lengths are scatter points [4] suggested by Andrew, the main axis (at left) shows the times (multiplied by a const, so don't use the absolute values with ns/op), the blue dots shows the base trend, the orange dots belong to the patch, while the yellow dots (secondary axis) stand for the perf gains (patch vs base). For example, in [3], with the patch, the orange curve (patch) becomes be "monotonically increasing" with [24, 71] chars, which is better than the shape of blue curve (base) and it is what I (we) want. Tests: jtreg tier1 all, hotspot_all, string-density-bench.jar, no regression found. [1] http://cr.openjdk.java.net/~qpzhang/8229351/8229351-strcmp-perf.xlsx [2] http://cr.openjdk.java.net/~qpzhang/8229351/perf-strcmp-UU.JPG [3] http://cr.openjdk.java.net/~qpzhang/8229351/perf-strcmp-LU.JPG [4] 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 19, 21, 23, 25, 28, 30, 34, 37, 41, 45, 49, 54, 60, 66, 72, 80, 88, 97, 106, 117, 129, 142, 156, 171, 189, 207, 228, 251 Regards Patrick -----Original Message----- From: aarch64-port-dev On Behalf Of Patrick Zhang OS Sent: Friday, November 15, 2019 4:05 PM To: Andrew Haley ; Andrew Dinn Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable To avoid future confusion, I am going to split the patch, take out the updates for generate_compare_long_string_different_encoding, which drops two redundant temp Register vars and related unused instructions, then create a new for your review. It has nothing to do with the proposed option. And I will continue working the remaining parts according to your comments and suggestions.. Regards Patrick -----Original Message----- From: aarch64-port-dev On Behalf Of Patrick Zhang OS Sent: Thursday, November 14, 2019 7:14 PM To: Andrew Haley ; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable >> Why do we care about out-of-boundary prefetching for LL/UU? I don't think we do if it requires any extra logic. I was thinking out-of-boundary prefetching should be prevented, and UL/LU has the same condition, if no need, we could force set largeLoopExitCondition to be 64 so that more cases can freely stay in the large loop. I don't think so. http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#l4194 if (SoftwarePrefetchHintDistance >= 0) { __ bind(LARGE_LOOP_PREFETCH); __ prfm(Address(str1, SoftwarePrefetchHintDistance)); __ prfm(Address(str2, SoftwarePrefetchHintDistance)); compare_string_16_bytes_same(DIFF, DIFF2); compare_string_16_bytes_same(DIFF, DIFF2); __ sub(cnt2, cnt2, isLL ? 64 : 32); compare_string_16_bytes_same(DIFF, DIFF2); - __ subs(rscratch2, cnt2, largeLoopExitCondition); + __ subs(rscratch2, cnt2, 64); compare_string_16_bytes_same(DIFF, DIFF2); __ br(__ GT, LARGE_LOOP_PREFETCH); __ cbz(cnt2, LAST_CHECK_AND_LENGTH_DIFF); // no more chars left? } >> Do you have a theory that LU/UL cases are common? Why? The only "theory" can be compare_string_16_x_LU (in the stub) is fater than the 8 bytes main loop (out of the stub) (http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#l4997), even not in the large loop of stub, the small loop can be faster as well since it is able to process more bytes within fewer instructions (http://hg.openjdk.java.net/jdk/jdk/file/355f4f42dda5/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#l4203). I can prepare a new patch with the updates to tests, and plot the timings soon latter. Regards Patrick -----Original Message----- From: Andrew Haley Sent: Thursday, November 14, 2019 6:33 PM To: Patrick Zhang OS ; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8229351: AArch64: Make the stub threshold of string_compare intrinsic tunable On 11/14/19 9:20 AM, Patrick Zhang OS wrote: > Thanks for the comments, see my answers below please. > >>> 1. This patch seems to do rather a lot. > Yes, it enables tweaking the stub parameters (not really changed any > in this patch), fixed an out-of-boundary prefetching for LL/UU, and > fixed some redundant instructions in LU/UL code path. The latter two > are code-quality-wise, if splitting the patch could make the changes > clearer, I'd like to do. Why do we care about out-of-boundary prefetching for LL/UU? I don't think we do if it requires any extra logic. >>> 2. Are the thresholds bytes or characters? > All thresholds are (and should be) in characters. This was a little > bit misleading, for LL/LU/UL, the const STUB_THRESHOLD meant chars, > while for UU it could be explained as bytes. If specified > -XX:-CompactStrings, all code path going to UU would make the > threshold mean bytes, which might confuse developers. This patch can > clarify it, and the description of tunable options can provide further > guidance. It must. Without some commentary both maintainers and developers are lost. Unless there is some very strong reason, all counts must specify units. >>> 3. How are we supposed to test with these different thresholds? > There are two jtreg tests for checking the impacts of > SoftwarePrefetchHintDistance over the intrinsics, I have locally added > non-default thresholds inside and tested with many lengths (took days > on a test system). This has not been included in the proposed patch, > maybe a follow-up one would do, any advice? > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToSameLength > .java > hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentL > ength.java I won't accept this patch unless it is accompanied by test cases that properly exercise the code. >>> 4. What are the thresholds you tested? > Firstly, the default threshold, the hardcoded 72 is my testing focus > since I would try best not to bring negative impacts to aarch64-port > normal state, especially other CPU vendors. > Second, I tested two extreme thresholds: 24 and 255, which means more > shorter strings (24 to 71 chars) or only very long strings > (>=255) could go to the stub code path, respectively. Function tests > passed (listed in the initial email), while performance test results > (with string-density-bench, StringCompareBench.java, and > SPECjbb2015) could be varying with different systems (as well as > microarchitectures). > Third, some other non-default thresholds, as sanity check, > particularly for ensuring correctness. It's the extremes that really matter, I suspect. >>> 5. But the more serious problem is the fact that we have different >>> code paths for different microarchitectures, and somehow this has to >>> be standard supportable software. In order to test this stuff we'll >>> need different test parameters for SoftwarePrefetchHintDistance, >>> CompareLongStringLimitLatin, CompareLongStringLimitUTF > The STUB_THRESHOLD was introduced to control the stub code insertion, > tested on some aarch64 systems. I think making it tunable is the way > to let different microarchitectures be able to configure optimal ones > for their own. Well, yes. The question is whether we go down this rabbit hole or try to find a compromise that is perhaps not quite optimal for anyone but good enough for everyone. > I would like to have a common threshold too, or no threshold for all, > but lacking of full-coverage tests over all systems. Maybe I > misunderstood you points here with regards to "supportable", the two > new options can be kept as default if developers have no concerns on > string compare intrinsics. I rather suspect that vendors will want to change the defaults sooner or later. And besides, we'll all have to support these options. >>> 6. We already emit a great deal of in-line code in the >>> string_compare intrinsic, with the intention that this be as fast as >>> possible because we want to avoid having to call the intrinsic. So >>> why is the intrinsic actually faster in your case? > Avoid having to call the intrinsic? I meant "the stub". > If you did NOT mean completely "avoiding intrinsic", but the strings > shorter than 72 chars, I would have to say, "it depends". The stub > functions try best to process every 16 chars, while the outer logic > processes every 8 bytes, which is the major diff. For example, I can > see consistent 1.5x faster with lengths 24-71 for LU/UL cases, maybe > others cannot, which can be reason why we need an option here. I know that strings of length 24 - 30ish are very common, so this is an important case. Do you have a theory that LU/UL cases are common? Why? What is it like with LL/UU? I'd need to see real timings. I'd either do all numbers < 256 or (to save time) a sequence like... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 19, 21, 23, 25, 28, 30, 34, 37, 41, 45, 49, 54, 60, 66, 72, 80, 88, 97, 106, 117, 129, 142, 156, 171, 189, 207, 228, 251 The idea here is that we an plot a graph. The timings should ideally be monotonically increasing. And then we could see how different processors behave, and hopefully find a decent solution for all. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nick.gasson at arm.com Thu Nov 28 07:50:32 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 28 Nov 2019 15:50:32 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> Message-ID: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> Hi Andrew, >> >> CompressedKlassPointers::base() => 0xffff0b4b5000 >> CompressedKlassPointers::shift() => 3 > > This is bad. Can you have a look at the allocation code to see why the search > for an appropriate address range fails? > We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs that searches for a 4G aligned location for the compressed class space on AArch64, but this search is not done if CDS is in use and the archive was loaded successfully, because in that case the class space has already been mapped (i.e. `metaspace_rs.is_reserved()' is true). Previously it was only possible to map the CDS archive at 0x800000000. The compressed class base is set to the start of this region which happens to be 4G aligned so our MacroAssembler::load_klass optimisation applies and we emit the short code sequence. With the recent change in 8231610, if the CDS archive cannot be mapped at that address (e.g. because of ASLR or because the heap is mapped there) then the CDS archive will be relocated to an arbitrary address decided by mmap. That's where the oddly-aligned compressed klass base above comes from. This causes MacroAssembler::load_klass to emit the inefficient sequence which then overflows the buffer for the itable stub (the worst-case size estimate there is wrong, which needs to be fixed separately). A minimal way to reproduce this is: $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version ... [0.050s][info ][cds] CDS archive was created with max heap size = 128M, and the following configuration: [0.050s][info ][cds] narrow_klass_base = 0x0000fffec7507000, narrow_klass_shift = 3 ... # guarantee(masm->pc() <= s->code_end()) failed: itable #2: overflowed buffer, estimated len: 180, actual len: 184, overrun: 4 I suggest we move the 4G-aligned search from allocate_metaspace_compressed_klass_ptrs into its own function that can then be called from MetaspaceShared::reserve_shared_space when requested_address==NULL (i.e. the fallback path when mmap at 0x800000000 fails). If you're happy with this I'll make a patch for review? Thanks, Nick From boris.ulasevich at bell-sw.com Thu Nov 28 08:42:43 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 28 Nov 2019 11:42:43 +0300 Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: <6e5d8aec-538e-20c7-a035-b04ff7e8691f@bell-sw.com> Thank you! On 27.11.2019 19:06, Stuart Monteith wrote: > Thanks Boris - looks good to me. > Please ask me or my fellow Arm engineers if you should need any help > testing in future. > > On Wed, 27 Nov 2019 at 13:26, Vladimir Ivanov > wrote: >> >> The fix looks good and trivial. >> >> Best regards, >> Vladimir Ivanov >> >> On 27.11.2019 15:55, Boris Ulasevich wrote: >>> Hi, >>> >>> Please review the fix in aarch64.ad to address the build issue "Ideal >>> node missing: CmpOp" raised after recent change in C2. The intuitive >>> operand name case correction CmpOp->cmpOp fixes the build, but leads to >>> unworkable jvm. Removing the match rule works good and jdk/hotspot tests >>> are Ok. >>> >>> http://bugs.openjdk.java.net/browse/JDK-8234891 >>> http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 >>> >>> ARM32 build fails too. I will fix the problem in arm32.ad file separately. >>> >>> thanks, >>> Boris From aph at redhat.com Thu Nov 28 09:36:24 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Nov 2019 09:36:24 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: References: Message-ID: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> See also "8220351: Cross-modifying code". That scheme is used by other ports but not AArch64. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Nov 28 10:03:18 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Nov 2019 10:03:18 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> Message-ID: <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> On 11/28/19 7:50 AM, Nick Gasson wrote: > Hi Andrew, > >>> >>> CompressedKlassPointers::base() => 0xffff0b4b5000 >>> CompressedKlassPointers::shift() => 3 >> >> This is bad. Can you have a look at the allocation code to see why the search >> for an appropriate address range fails? > > We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs > that searches for a 4G aligned location for the compressed class space > on AArch64, but this search is not done if CDS is in use and the archive > was loaded successfully, because in that case the class space has > already been mapped (i.e. `metaspace_rs.is_reserved()' is true). Right. At the time I wrote that code, CDS was not much used by anything, so I thought of it as a mariganl use case. > Previously it was only possible to map the CDS archive at 0x800000000. > The compressed class base is set to the start of this region which > happens to be 4G aligned so our MacroAssembler::load_klass optimisation > applies and we emit the short code sequence. > > With the recent change in 8231610, if the CDS archive cannot be mapped > at that address (e.g. because of ASLR or because the heap is mapped > there) then the CDS archive will be relocated to an arbitrary address > decided by mmap. That's where the oddly-aligned compressed klass base > above comes from. This causes MacroAssembler::load_klass to emit the > inefficient sequence which then overflows the buffer for the itable stub > (the worst-case size estimate there is wrong, which needs to be fixed > separately). Correcting the stub size is a minor tidy-up which does not really need its own Bug ID. > A minimal way to reproduce this is: > > $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version > ... > [0.050s][info ][cds] CDS archive was created with max heap size = 128M, > and the following configuration: > [0.050s][info ][cds] narrow_klass_base = 0x0000fffec7507000, > narrow_klass_shift = 3 > ... > # guarantee(masm->pc() <= s->code_end()) failed: itable #2: overflowed > buffer, estimated len: 180, actual len: 184, overrun: 4 > > > I suggest we move the 4G-aligned search from > allocate_metaspace_compressed_klass_ptrs into its own function that can > then be called from MetaspaceShared::reserve_shared_space when > requested_address==NULL (i.e. the fallback path when mmap at 0x800000000 > fails). If you're happy with this I'll make a patch for review? Yes, that sounds excellent. We really need it to avoid compressed class pointers becoming an expensive option. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nick.gasson at arm.com Thu Nov 28 10:18:38 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 28 Nov 2019 18:18:38 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> Message-ID: <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com> On 28/11/2019 18:03, Andrew Haley wrote: >> (the worst-case size estimate there is wrong, which needs to be fixed >> separately). > > Correcting the stub size is a minor tidy-up which does not really need > its own Bug ID. > OK, but I'd like to also try removing the second call to __ load_klass in VtableStubs::create_itable_stub as that will shave a few instructions even in the normal case. I'll recalculate the size estimate when I do that. Thanks, Nick From aph at redhat.com Thu Nov 28 10:28:49 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Nov 2019 10:28:49 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com> Message-ID: On 11/28/19 10:18 AM, Nick Gasson wrote: > OK, but I'd like to also try removing the second call to __ load_klass > in VtableStubs::create_itable_stub as that will shave a few instructions > even in the normal case. I'll recalculate the size estimate when I do that. OK. But beware of spending time on things that don't really matter. There's a risk in making any change. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Thu Nov 28 11:50:16 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 28 Nov 2019 11:50:16 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> References: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Thursday, November 28, 2019 5:36 PM > To: Yangfei (Felix) ; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] Question about ISB usage in the aarch64 port > > See also "8220351: Cross-modifying code". That scheme is used by other ports > but not AArch64. Thanks for this helpful information. BTW: should we change aarch64 to use this scheme too? Felix From adinn at redhat.com Thu Nov 28 13:32:03 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 28 Nov 2019 13:32:03 +0000 Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: <0960f240-6801-48c2-9664-c7509e90f4a5@redhat.com> On 27/11/2019 13:23, Vladimir Ivanov wrote: > The fix looks good and trivial. Yes, the patch is good. The CmpOp matches are not needed and perhaps never were. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From adinn at redhat.com Thu Nov 28 13:43:09 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 28 Nov 2019 13:43:09 +0000 Subject: [aarch64-port-dev ] RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when profiling return and parameters type In-Reply-To: References: Message-ID: Hi Felix, On 25/11/2019 11:33, Yangfei (Felix) wrote: > Ping? Any comments? Yes, that load into mdp is redundant. x86 omits the load and so should AArch64. The patch is good. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Thu Nov 28 14:02:28 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Nov 2019 14:02:28 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: References: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> Message-ID: On 11/28/19 11:50 AM, Yangfei (Felix) wrote: >> -----Original Message----- >> From: Andrew Haley [mailto:aph at redhat.com] >> Sent: Thursday, November 28, 2019 5:36 PM >> To: Yangfei (Felix) ; >> aarch64-port-dev at openjdk.java.net >> Subject: Re: [aarch64-port-dev ] Question about ISB usage in the aarch64 port >> >> See also "8220351: Cross-modifying code". That scheme is used by other ports >> but not AArch64. > > Thanks for this helpful information. > BTW: should we change aarch64 to use this scheme too? Not unless we have a reason. I had a look and there seemed to be no advantage. By the way, did you find the source of your original problem? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Fri Nov 29 02:41:27 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 29 Nov 2019 02:41:27 +0000 Subject: [aarch64-port-dev ] RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when profiling return and parameters type In-Reply-To: References: Message-ID: > -----Original Message----- > From: Andrew Dinn [mailto:adinn at redhat.com] > Sent: Thursday, November 28, 2019 9:43 PM > To: Yangfei (Felix) ; > hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: Re: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when > profiling return and parameters type > > Hi Felix, > > On 25/11/2019 11:33, Yangfei (Felix) wrote: > > Ping? Any comments? > > Yes, that load into mdp is redundant. x86 omits the load and so should AArch64. > The patch is good. > Hi Andrew, Thanks for reviewing. Pushed: http://hg.openjdk.java.net/jdk/jdk/rev/fc216dcef2bb Felix From Pengfei.Li at arm.com Fri Nov 29 03:41:56 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 29 Nov 2019 03:41:56 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: Hi Andrew, I just caught up with your discussion with Nick. > I guess. It'd be nicer to fix CDS on AArch64 so that it doesn't cause > performance regressions. The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2019-November/008278.html -- Thanks, Pengfei From Pengfei.Li at arm.com Fri Nov 29 03:56:50 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 29 Nov 2019 03:56:50 +0000 Subject: [aarch64-port-dev ] RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 Message-ID: Hi, Please help review this small fix for 64-bit client build. Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8234791 Current 64-bit client VM build fails because errors occurred in dumping the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which runs "java -Xshare:dump" after linking the JDK image. But for Client VM build on 64-bit platforms, the ergonomic flag UseCompressedOops is not set.[2] This leads to VM exits in checking the flags for dumping the shared archive.[3] This change removes the "#if defined" macro to make shared archive dump successful in 64-bit client build. By tracking the history of the macro, I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago when C1 did not have a good support of compressed oops and modified to current shape[5] in the implementation of tiered compilation. It should be safe to be removed today. This patch also fixes another client build issue on AArch64. [1] http://openjdk.java.net/jeps/341 [2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694 [3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551 [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7 [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56 -- Thanks, Pengfei From nick.gasson at arm.com Fri Nov 29 06:40:23 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 29 Nov 2019 14:40:23 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: On 29/11/2019 11:41, Pengfei Li (Arm Technology China) wrote: > > The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. > How about we exit with a fatal error if we can't find a suitably aligned region? Then we can remove the code in decode_klass_non_null that uses R27 and this patch is much simpler. That code path is poorly tested at the moment so it seems risky to leave it in. With a hard error at least users will report it to us so we can fix it. Thanks, Nick From nick.gasson at arm.com Fri Nov 29 06:56:47 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 29 Nov 2019 14:56:47 +0800 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: References: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> Message-ID: <69910cf8-84ee-2048-796f-452d43adaaf9@arm.com> On 28/11/2019 22:02, Andrew Haley wrote: >> BTW: should we change aarch64 to use this scheme too? > > Not unless we have a reason. I had a look and there seemed to be no advantage. > I don't think it helps on AArch64: that OrderAccess::cross_modifying_fence() is only called when a thread is about to return from the safepoint handler. But it's possible for a safepoint with code patching to happen in the background while a thread is in native code, in which case we still need to do an ISB when returning to Java. I'm not sure how other ports that need a serialising instruction handle this? Thanks, Nick From felix.yang at huawei.com Fri Nov 29 08:06:02 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 29 Nov 2019 08:06:02 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: References: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Thursday, November 28, 2019 10:02 PM > To: Yangfei (Felix) ; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] Question about ISB usage in the aarch64 port > > On 11/28/19 11:50 AM, Yangfei (Felix) wrote: > >> -----Original Message----- > >> From: Andrew Haley [mailto:aph at redhat.com] > >> Sent: Thursday, November 28, 2019 5:36 PM > >> To: Yangfei (Felix) ; > >> aarch64-port-dev at openjdk.java.net > >> Subject: Re: [aarch64-port-dev ] Question about ISB usage in the > >> aarch64 port > >> > >> See also "8220351: Cross-modifying code". That scheme is used by > >> other ports but not AArch64. > > > > Thanks for this helpful information. > > BTW: should we change aarch64 to use this scheme too? > > Not unless we have a reason. I had a look and there seemed to be no > advantage. > > By the way, did you find the source of your original problem? Not yet. It triggers randomly which makes it hard to narrow down the root cause. Suggestions are welcome :-) Thanks, Felix From adinn at redhat.com Fri Nov 29 09:19:12 2019 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 29 Nov 2019 09:19:12 +0000 Subject: [aarch64-port-dev ] RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: References: Message-ID: <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com> Hi Pengfei, On 29/11/2019 03:56, Pengfei Li (Arm Technology China) wrote: > Please help review this small fix for 64-bit client build. > > Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8234791 > > Current 64-bit client VM build fails because errors occurred in dumping > the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which > runs "java -Xshare:dump" after linking the JDK image. But for Client VM > build on 64-bit platforms, the ergonomic flag UseCompressedOops is not > set.[2] This leads to VM exits in checking the flags for dumping the > shared archive.[3] > > This change removes the "#if defined" macro to make shared archive dump > successful in 64-bit client build. By tracking the history of the macro, > I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago > when C1 did not have a good support of compressed oops and modified to > current shape[5] in the implementation of tiered compilation. It should > be safe to be removed today. > > This patch also fixes another client build issue on AArch64. > > [1] http://openjdk.java.net/jeps/341 > [2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694 > [3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551 > [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7 > [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56 Your explanation sounds correct and the change to arguments.cpp looks good. Can you explain why you have modified sharedRuntime_aarch64.cpp to include nativeInst_aarch64.hpp? I don't see any other change in the source file that would make this necessary. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Fri Nov 29 09:53:14 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 09:53:14 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: References: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> Message-ID: <09b127f2-95bc-5d53-9b02-a2e8e23c0deb@redhat.com> On 11/29/19 8:06 AM, Yangfei (Felix) wrote: > Not yet. It triggers randomly which makes it hard to narrow down the root cause. > Suggestions are welcome :-) I'd set things up to deoptimize and recompile continually, thrashing the life out of the code cache. Run many Java threads. If the problem really is recompilation you'll see it. That is always my recommendation: if you have a bug to diagnose, do everything you can to make the bug worse. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Nov 29 09:59:18 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 09:59:18 +0000 Subject: [aarch64-port-dev ] Question about ISB usage in the aarch64 port In-Reply-To: <69910cf8-84ee-2048-796f-452d43adaaf9@arm.com> References: <496dd418-02a5-6566-4f22-76d87f263926@redhat.com> <69910cf8-84ee-2048-796f-452d43adaaf9@arm.com> Message-ID: <7ca12202-6b65-9121-9e9d-1f3c6001124f@redhat.com> On 11/29/19 6:56 AM, Nick Gasson wrote: > I don't think it helps on AArch64: that > OrderAccess::cross_modifying_fence() is only called when a thread is > about to return from the safepoint handler. But it's possible for a > safepoint with code patching to happen in the background while a thread > is in native code, in which case we still need to do an ISB when > returning to Java. Indeed we do. > I'm not sure how other ports that need a serialising instruction handle > this? PPC requirements are very similar to ours. I would have thought they already had something before this patch, or they would surely have had some problems. In any case, I haven't studied the code transitions that are covered by that this patch. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Pengfei.Li at arm.com Fri Nov 29 10:01:37 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 29 Nov 2019 10:01:37 +0000 Subject: [aarch64-port-dev ] RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com> References: <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com> Message-ID: Hi Andrew Dinn, > Your explanation sounds correct and the change to arguments.cpp looks > good. > > Can you explain why you have modified sharedRuntime_aarch64.cpp to > include nativeInst_aarch64.hpp? I don't see any other change in the source > file that would make this necessary. Thanks for review. There is another build error below after I fixed arguments.cpp. For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o: ....../src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2836:22: error: 'NativeInstruction' has not been declared __ add(r20, r20, NativeInstruction::instruction_size); We see that sharedRuntime_aarch64.cpp uses NativeInstruction but doesn't include nativeInst_aarch64.hpp. There is no error in Server VM build because the header file is included indirectly from some C2 file. But for Client VM build where C2 files are not in, this error occurs. -- Thanks, Pengfei From aph at redhat.com Fri Nov 29 10:07:38 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 10:07:38 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: <8a0ae655-8544-a4fc-7551-d7634ebdaaa8@redhat.com> On 11/29/19 3:41 AM, Pengfei Li (Arm Technology China) wrote: > The 4G alignment search may still fail after the fix. It may, but very unlikely. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. > > [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2019-November/008278.html Not really, no. A method should be called exactly once from the code that does the memory allocation, and then set a flag to be read thereafter. It is not ideal to do it from the MacroAssembler constructor, because Assembler instances are created wihte very hihg frequency. I don't undestand why you simply can't do what I suggested. You say > But we have to do it in Metaspace::set_narrow_klass_base_and_shift() > where the base and shift are finally determined and introduce new > code block of "#ifdef AARCH64 #endif" in HotSpot shared code. So do that, or perhaps introduce an overridable function in AbstractAssembler which does nothing on other ports. But don't keep executing the same logic again and again. Once base and shift are set they never change. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Nov 29 10:10:07 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 10:10:07 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> On 11/29/19 6:40 AM, Nick Gasson wrote: > On 29/11/2019 11:41, Pengfei Li (Arm Technology China) wrote: >> The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. >> > How about we exit with a fatal error if we can't find a suitably aligned > region? Then we can remove the code in decode_klass_non_null that uses > R27 and this patch is much simpler. That code path is poorly tested at > the moment so it seems risky to leave it in. With a hard error at least > users will report it to us so we can fix it. That is starting to sound very attractive. With a 64-bit address space I'm finding it very hard to imagine a scenario in which we don't find a suitable address. I think AOT-compiled code would still be OK, because it generates different code, but we'd have to do some testing. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Nov 29 10:11:33 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 10:11:33 +0000 Subject: [aarch64-port-dev ] RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: References: <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com> Message-ID: <6af3f9d1-f1e3-7f03-6056-fd0c36af65b7@redhat.com> On 11/29/19 10:01 AM, Pengfei Li (Arm Technology China) wrote: > Thanks for review. There is another build error below after I fixed arguments.cpp. > > For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o: > ....../src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2836:22: error: 'NativeInstruction' has not been declared > __ add(r20, r20, NativeInstruction::instruction_size); > > We see that sharedRuntime_aarch64.cpp uses NativeInstruction but doesn't include nativeInst_aarch64.hpp. > There is no error in Server VM build because the header file is included indirectly from some C2 file. > But for Client VM build where C2 files are not in, this error occurs. OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Fri Nov 29 10:20:40 2019 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 29 Nov 2019 10:20:40 +0000 Subject: [aarch64-port-dev ] RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: References: <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com> Message-ID: HiPengfei, On 29/11/2019 10:01, Pengfei Li (Arm Technology China) wrote: >> Can you explain why you have modified sharedRuntime_aarch64.cpp to >> include nativeInst_aarch64.hpp? I don't see any other change in the source >> file that would make this necessary. > > Thanks for review. There is another build error below after I fixed arguments.cpp. > > For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o: > ....../src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2836:22: error: 'NativeInstruction' has not been declared > __ add(r20, r20, NativeInstruction::instruction_size); > > We see that sharedRuntime_aarch64.cpp uses NativeInstruction but doesn't include nativeInst_aarch64.hpp. > There is no error in Server VM build because the header file is included indirectly from some C2 file. > But for Client VM build where C2 files are not in, this error occurs. Ok, in that case the patch is good to push. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From adinn at redhat.com Fri Nov 29 11:57:01 2019 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 29 Nov 2019 11:57:01 +0000 Subject: [aarch64-port-dev ] 8233948: AArch64: Incorrect mapping between OptoReg and VMReg for high 64 bits of Vector Register In-Reply-To: References: Message-ID: Hi Joshua, Thanks for looking into this and suggesting the required cleanup. On 15/11/2019 10:29, Joshua Zhu (Arm Technology China) wrote: >> Please review the following patch: >> JBS: https://bugs.openjdk.java.net/browse/JDK-8233948 >> Webrev: http://cr.openjdk.java.net/~jzhu/8233948/webrev.00/ > > Please let me know if any comments. Thanks a lot. I think this is a good start but there is more work to do to clean up method RegisterSaver::save_live_registers defined in file sharedRuntime_aarch64.cpp. It would be good to do that clean up as part of this patch so it is all consistent. So, the first step is to add a couple of extra enum constants in FloatRegisterImpl: 128 class FloatRegisterImpl: public AbstractRegisterImpl { 129 public: 130 enum { 131 number_of_registers = 32, 132 max_slots_per_register = 4, save_slots_per_register = 2, extra_save_slots_per_register = 2 The 2 new tags are needed because sharedRuntime_aarch64.cpp normally only saves 2 slots per register but it occasionally needs to save all 4. The first bit of code in sharedRuntime_aarch64.cpp that needs fixing is this enum: 100 enum layout { 101 fpu_state_off = 0, 102 fpu_state_end = fpu_state_off+FPUStateSizeInWords-1, 103 // The frame sender code expects that rfp will be in 104 // the "natural" place and will override any oopMap 105 // setting for it. We must therefore force the layout 106 // so that it agrees with the frame sender code. 107 r0_off = fpu_state_off+FPUStateSizeInWords, 108 rfp_off = r0_off + 30 * 2, 109 return_off = rfp_off + 2, // slot for return address 110 reg_save_size = return_off + 2}; This information defines the layout of the data normally saved to stack (i.e. 2 slots per fp reg). These values should really be computed using the enum values you added to the definitions for RegisterImpl and FloatRegisterImpl. FPUStateSizeInWords is actually defined in assembler.hpp. It doesn't really need to be there but we put it there to follow the logic for x86 where the amount of saved state is more complicated. The AArch64 definiton at assembler.hpp:607 is this: 607 const int FPUStateSizeInWords = 32 * 2; So, that can now be redefined as 607 const int FPUStateSizeInWords = FloatRegisterImpl::number_of_registers * FloatRegisterImpl::save_slots_per_register; We then need to redefine the code at lines 108 - 110 to use the enum values: 108 rfp_off = r0_off + (RegisterImpl::number_of_registers - 2) * RegisterImpl::max_slots_per_register, 109 return_off = rfp_off + RegisterImpl::max_slots_per_register, // slot for return address 110 reg_save_size = return_off + RegisterImpl::max_slots_per_register}; Finally, we can method edit save_live_registers at the point where it allows space for the extra vector register content. That needs to be updated to use the relevant constants: 116 if (save_vectors) { 117 // Save upper half of vector registers 118 int vect_words = FloatRegisterImpl::number_of_registers * FloatRegisterImpl::extra_save_slots_per_register; 119 additional_frame_words += vect_words; Could you prepare a new webrev with these extra changes in and check it is ok? Also, could you report what testing you did before and after your change (other than checking the dump output). You will probably need to repeat it to ensure these extra changes are ok. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From ioi.lam at oracle.com Thu Nov 28 08:19:36 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 28 Nov 2019 00:19:36 -0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> Message-ID: On 11/27/19 11:50 PM, Nick Gasson wrote: > Hi Andrew, > >>> >>> CompressedKlassPointers::base() => 0xffff0b4b5000 >>> CompressedKlassPointers::shift() => 3 >> >> This is bad. Can you have a look at the allocation code to see why >> the search >> for an appropriate address range fails? >> > > We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs > that searches for a 4G aligned location for the compressed class space > on AArch64, but this search is not done if CDS is in use and the > archive was loaded successfully, because in that case the class space > has already been mapped (i.e. `metaspace_rs.is_reserved()' is true). > > Previously it was only possible to map the CDS archive at 0x800000000. > The compressed class base is set to the start of this region which > happens to be 4G aligned so our MacroAssembler::load_klass > optimisation applies and we emit the short code sequence. > > With the recent change in 8231610, if the CDS archive cannot be mapped > at that address (e.g. because of ASLR or because the heap is mapped > there) then the CDS archive will be relocated to an arbitrary address > decided by mmap. That's where the oddly-aligned compressed klass base > above comes from. This causes MacroAssembler::load_klass to emit the > inefficient sequence which then overflows the buffer for the itable > stub (the worst-case size estimate there is wrong, which needs to be > fixed separately). > > A minimal way to reproduce this is: > > $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version > ... > [0.050s][info ][cds] CDS archive was created with max heap size = > 128M, and the following configuration: > [0.050s][info ][cds]???? narrow_klass_base = 0x0000fffec7507000, > narrow_klass_shift = 3 > ... > #? guarantee(masm->pc() <= s->code_end()) failed: itable #2: > overflowed buffer, estimated len: 180, actual len: 184, overrun: 4 > > > I suggest we move the 4G-aligned search from > allocate_metaspace_compressed_klass_ptrs into its own function that > can then be called from MetaspaceShared::reserve_shared_space when > requested_address==NULL (i.e. the fallback path when mmap at > 0x800000000 fails). If you're happy with this I'll make a patch for > review? > You can also force CDS archive relocation with -XX:+UnlockDiagnosticVMOptions -XX:ArchiveRelocationMode=1. That way you can test the behavior with the default heap settings. Thanks - Ioi > > Thanks, > Nick From ioi.lam at oracle.com Sat Nov 30 01:02:29 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 29 Nov 2019 17:02:29 -0800 Subject: [aarch64-port-dev ] RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: References: Message-ID: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com> Hi Pengfei, I have cc-ed hotspot-compiler-dev at openjdk.java.net. Please do not push the patch until someone from hotspot-compiler-dev has looked at it. Many people are away due to Thanksgiving in the US. Thanks - Ioi On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote: > Hi, > > Please help review this small fix for 64-bit client build. > > Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8234791 > > Current 64-bit client VM build fails because errors occurred in dumping > the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which > runs "java -Xshare:dump" after linking the JDK image. But for Client VM > build on 64-bit platforms, the ergonomic flag UseCompressedOops is not > set.[2] This leads to VM exits in checking the flags for dumping the > shared archive.[3] > > This change removes the "#if defined" macro to make shared archive dump > successful in 64-bit client build. By tracking the history of the macro, > I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago > when C1 did not have a good support of compressed oops and modified to > current shape[5] in the implementation of tiered compilation. It should > be safe to be removed today. > > This patch also fixes another client build issue on AArch64. > > [1] http://openjdk.java.net/jeps/341 > [2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694 > [3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551 > [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7 > [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56 > > -- > Thanks, > Pengfei >