From ddong at openjdk.java.net Sat May 1 09:37:14 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sat, 1 May 2021 09:37:14 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v3] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: use new_pointer_register ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/8315bffd..93ae3346 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Sat May 1 09:39:54 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sat, 1 May 2021 09:39:54 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v2] In-Reply-To: References: Message-ID: On Tue, 27 Apr 2021 09:28:46 GMT, Yi Yang wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> add doc > > src/hotspot/share/c1/c1_LIRGenerator.cpp line 3049: > >> 3047: >> 3048: #ifdef JFR_HAVE_INTRINSICS >> 3049: void LIRGenerator::do_ClassIDIntrinsic(Intrinsic* x) { > > Is this method really important and hot enough that we need to intrinsify it both in C1 and C2? This optimization works when user-defined events contain fields of class type, so I think it is necessary to implement it in both C1 and C2 > src/hotspot/share/c1/c1_LIRGenerator.cpp line 3095: > >> 3093: >> 3094: LIR_Opr epoch = new_register(T_INT); >> 3095: LIR_Opr epoch_address = new_register(T_LONG); > > new_pointer_register fixed, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Sat May 1 09:47:51 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sat, 1 May 2021 09:47:51 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v3] In-Reply-To: References: Message-ID: On Sat, 1 May 2021 09:37:14 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > use new_pointer_register @egahlin Hi, could you help review this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From jiefu at openjdk.java.net Sat May 1 10:53:00 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 1 May 2021 10:53:00 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags Message-ID: Hi all, Could you please review this small and trivial patch that adds `@requires vm.flagless` to `compiler/intrinsics/sha/cli ` tests that ignore VM flags? This change makes sense since it will fix some test failures when testing with extra VM flags. For example, the following three failures will be fixed when testing with UseAVX<2 on Intel CPUs. compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnUnsupportedCPU.java Thanks. Best regards, Jie ------------- Commit messages: - 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags Changes: https://git.openjdk.java.net/jdk/pull/3829/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3829&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266401 Stats: 14 lines in 12 files changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3829.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3829/head:pull/3829 PR: https://git.openjdk.java.net/jdk/pull/3829 From github.com+20216587+miao-zheng at openjdk.java.net Sat May 1 11:27:06 2021 From: github.com+20216587+miao-zheng at openjdk.java.net (Miao Zheng) Date: Sat, 1 May 2021 11:27:06 GMT Subject: RFR: 8265915: adjust state_unloading_cycle compuation order in nmethod::is_unloading Message-ID: Trivial change of moving state_unloading_cycle computation after state_is_unloading checking. Avoiding useless state_unloading_cycle computation when state_is_unloading is true. ------------- Commit messages: - 8265915: adjust state_unloading_cycle compuation order in nmethod::is_unloading Changes: https://git.openjdk.java.net/jdk/pull/3676/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3676&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265915 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3676.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3676/head:pull/3676 PR: https://git.openjdk.java.net/jdk/pull/3676 From mdoerr at openjdk.java.net Sat May 1 13:44:49 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Sat, 1 May 2021 13:44:49 GMT Subject: RFR: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind In-Reply-To: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> References: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> Message-ID: On Thu, 22 Apr 2021 18:58:28 GMT, Martin Doerr wrote: > PPC64 and s390 have DecodeN implementations which use a MachTemp input. When LCM hoists the DecodeN, the MachTemp nodes reside in the old block, but should get hoisted together with the DecodeN node. > Same is true for load Base input which exists on s390 for example. Unfortunately, that's just a platform specific MachNode which is not nicely recognizable in LCM. Hi Vladimir, thanks for reviewing my PR. Yes, base is an immediate constant (CompressedOops::base()). It's not a load from memory. MachTemp nodes are added and connected by ADLC generated code (ArchDesc::defineExpand). PPC64 and s390 code contains "expand" and "postalloc_expand" rules. E.g. decodeN_Ex in ppc64.ad uses a register with TEMP effect. ------------- PR: https://git.openjdk.java.net/jdk/pull/3637 From iignatyev at openjdk.java.net Sat May 1 15:04:49 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 1 May 2021 15:04:49 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: On Sat, 1 May 2021 10:48:37 GMT, Jie Fu wrote: > Hi all, > > Could you please review this small and trivial patch that adds `@requires vm.flagless` to `compiler/intrinsics/sha/cli ` tests that ignore VM flags? > > This change makes sense since it will fix some test failures when testing with extra VM flags. > > For example, the following three failures will be fixed when testing with UseAVX<2 on Intel CPUs. > > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnUnsupportedCPU.java > > > Thanks. > Best regards, > Jie Hi @DamonFool, if these tests ignore external flags, how do they fail w/ `-XX:UseAVX={0,1}`? Thanks, -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From kvn at openjdk.java.net Sat May 1 15:15:50 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 1 May 2021 15:15:50 GMT Subject: RFR: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind In-Reply-To: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> References: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> Message-ID: On Thu, 22 Apr 2021 18:58:28 GMT, Martin Doerr wrote: > PPC64 and s390 have DecodeN implementations which use a MachTemp input. When LCM hoists the DecodeN, the MachTemp nodes reside in the old block, but should get hoisted together with the DecodeN node. > Same is true for load Base input which exists on s390 for example. Unfortunately, that's just a platform specific MachNode which is not nicely recognizable in LCM. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3637 From iignatyev at openjdk.java.net Sat May 1 15:39:50 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 1 May 2021 15:39:50 GMT Subject: RFR: 8266074: Vtable-based CHA implementation In-Reply-To: References: <5GRZsKzjv3hRaV9LNgp2tQIaAmsUbIbGr4xGQDdkwus=.77bed390-12c6-4526-b8b1-c68f736b7c1c@github.com> <3Z7vXPRkLnBKfv84VVtXJI--I16HgEjydFtQflXkUBE=.44f5daab-6b9c-4e56-9aac-345b9b4d3a07@github.com> Message-ID: On Fri, 30 Apr 2021 22:10:02 GMT, Vladimir Kozlov wrote: >> I'm fine with both approaches. >> >> Explicitly setting the flag looked to me more robust and clearer communicating the intent. But if you prefer `@requires`, I'll use it. > > Let hear @iignatev opinion. from my point of view, `@requires` is clearer and also eliminates "wasted" execution (if someone tries to run this test w/ `-XX:-UseVtableBasedCHA`), so I'd prefer if we use it. I have a more generic comment about `UseVtableBasedCHA`. I understand the desire to introduce a flag to switch back to the old implementation, but I'm somewhat concern that it adds a new dimension into configuration space that won't be covered by our existing tests (w/ the test which exercises interesting parts of the related code is inapplicable) and isn't part of our regular test configurations. Can we make it an experimental flag (w/ vtable-based CHA still being enabled by default)? this way, the quality bar for the old implementation will be somewhat lower, yet the end-users will still be able to return to the old implementation if it, for some reason, works better in their use-cases. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/3727 From jiefu at openjdk.java.net Sat May 1 16:46:49 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 1 May 2021 16:46:49 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: On Sat, 1 May 2021 15:01:44 GMT, Igor Ignatyev wrote: > if these tests ignore external flags, how do they fail w/ `-XX:UseAVX={0,1}`? Well, yes and no. These tests will first start a main test process, which won't ignore external flags. Then the main test process will create new sub-test processes, which do ignore external flags. For example, if we run TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java with UseAVX=1 on an Intel CPU, then 1) The main test process will say SHA256_INSTRUCTION is not supported [1] since -XX:UseAVX=1, and then calls GenericTestCaseForUnsupportedX86CPU [2]. 2) Then a new sub-test process will be created in GenericTestCaseForUnsupportedX86CPU which expects UseSHA256Intrinsics to be false. 3) But the sub-test process will see UseSHA256Intrinsics is actually true since it ignores external flags. 4) Finally, the test fails. So these tests do ignore external flags in all the sub-test processes. And the tests may fail since the main test processes still accepts external flags, which is unexpected. It would be better to mark them as vm.flagless to reduce testing noise. Thnaks. [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java#L85 [2] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java#L49 ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From iignatyev at openjdk.java.net Sat May 1 17:19:49 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 1 May 2021 17:19:49 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: On Sat, 1 May 2021 10:48:37 GMT, Jie Fu wrote: > Hi all, > > Could you please review this small and trivial patch that adds `@requires vm.flagless` to `compiler/intrinsics/sha/cli ` tests that ignore VM flags? > > This change makes sense since it will fix some test failures when testing with extra VM flags. > > For example, the following three failures will be fixed when testing with UseAVX<2 on Intel CPUs. > > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnUnsupportedCPU.java > > > Thanks. > Best regards, > Jie Right, now I remember them :) they were actually in my first [webrev](http://cr.openjdk.java.net/~iignatyev//8246497/webrev.00/) for this. I?m not clear on your decision to add the copyright line to `TestUseSHA3IntrinsicsOptionOnUnsupportedCPU.java`, but not other files. ? Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From david.holmes at oracle.com Sat May 1 22:03:03 2021 From: david.holmes at oracle.com (David Holmes) Date: Sun, 2 May 2021 08:03:03 +1000 Subject: RFR: 8266074: Vtable-based CHA implementation In-Reply-To: References: <5GRZsKzjv3hRaV9LNgp2tQIaAmsUbIbGr4xGQDdkwus=.77bed390-12c6-4526-b8b1-c68f736b7c1c@github.com> <3Z7vXPRkLnBKfv84VVtXJI--I16HgEjydFtQflXkUBE=.44f5daab-6b9c-4e56-9aac-345b9b4d3a07@github.com> Message-ID: On 2/05/2021 1:39 am, Igor Ignatyev wrote: > On Fri, 30 Apr 2021 22:10:02 GMT, Vladimir Kozlov wrote: > >>> I'm fine with both approaches. >>> >>> Explicitly setting the flag looked to me more robust and clearer communicating the intent. But if you prefer `@requires`, I'll use it. >> >> Let hear @iignatev opinion. > > from my point of view, `@requires` is clearer and also eliminates "wasted" execution (if someone tries to run this test w/ `-XX:-UseVtableBasedCHA`), so I'd prefer if we use it. > > I have a more generic comment about `UseVtableBasedCHA`. I understand the desire to introduce a flag to switch back to the old implementation, but I'm somewhat concern that it adds a new dimension into configuration space that won't be covered by our existing tests (w/ the test which exercises interesting parts of the related code is inapplicable) and isn't part of our regular test configurations. Can we make it an experimental flag (w/ vtable-based CHA still being enabled by default)? this way, the quality bar for the old implementation will be somewhat lower, yet the end-users will still be able to return to the old implementation if it, for some reason, works better in their use-cases. Did you mean "experimental" in a generic sense or actually change it from DIAGNOSTIC to EXPERIMENTAL? If the latter then I don't agree this is an experimental flag, it is diagnostic. But either way the testing requirements are the same if we expect to tell end users to try this flag if they hit an problem - the flag has to be known to be functional, so we will have to expand the test coverage. Cheers, David ----- > -- Igor > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/3727 > From jiefu at openjdk.java.net Sat May 1 23:56:51 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 1 May 2021 23:56:51 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: On Sat, 1 May 2021 17:17:25 GMT, Igor Ignatyev wrote: > Right, now I remember them :) they were actually in my first [webrev](http://cr.openjdk.java.net/~iignatyev//8246497/webrev.00/) for this. > > I?m not clear on your decision to add the copyright line to `TestUseSHA3IntrinsicsOptionOnUnsupportedCPU.java`, but not other files. > > ? Igor This is because @vnkozlov taught me that we should add a new copyright line if the original line isn't Oracle, not just modify the copyright year. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From hshi at openjdk.java.net Sun May 2 00:50:14 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sun, 2 May 2021 00:50:14 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: References: Message-ID: > This patch fix failure exposed by JDK-8264649. > > compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 in Compile::check_no_dead_use assertion. > In LoadNode::eliminate_autobox, early "result" is dead after line 1450 but not added into PhaseGVN worklist for optimization. > Its out_cnt is 0. If it isn't removed, will trigger assertion in Compile::check_no_dead_use. > > > 1443 } else if (result->is_Add() && result->in(2)->is_Con() && > 1444 result->in(1)->Opcode() == Op_LShiftX && > 1445 result->in(1)->in(2) == phase->intcon(shift)) { > 1446 // We can't do general optimization: ((X<> Z ==> X + (Y>>Z) > 1447 // but for boxing cache access we know that X< 1448 // (there is range check) so we do this optimizatrion by hand here. > 1449 Node* add_con = new RShiftXNode(result->in(2), phase->intcon(shift)); > --- result before is dead and might not removed > 1450 result = new AddXNode(result->in(1)->in(1), phase->transform(add_con)); > 1451 } else > > > Detail analysis is in https://bugs.openjdk.java.net/browse/JDK-8265767 > > @mychris I have verified compiler/eliminateAutobox/TestIntBoxing.java on qemu, it failed with same assertion and now passes with this fix. Would you please help verify it on arm32 machine? > > Testing: > - Passed Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. > - compiler/eliminateAutobox/TestIntBoxing.java on arm32 release/fastdebug/slowdebug Hui Shi has updated the pull request incrementally with one additional commit since the last revision: Use PhaseIterGVN ptr as LoadNode::eliminate_autobox method parameter for simiplification and add comments for previous commit ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3818/files - new: https://git.openjdk.java.net/jdk/pull/3818/files/cfc353ed..f25b3838 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3818&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3818&range=00-01 Stats: 22 lines in 2 files changed: 1 ins; 0 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/3818.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3818/head:pull/3818 PR: https://git.openjdk.java.net/jdk/pull/3818 From hshi at openjdk.java.net Sun May 2 00:54:50 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sun, 2 May 2021 00:54:50 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: References: Message-ID: <6lQmNU2HwKgdQFqkFgbSV2mOHOwFzlV9HanRcPZOHvM=.ea6ea467-1edd-4e3f-a960-09e1e07eea67@github.com> On Fri, 30 Apr 2021 17:46:41 GMT, Vladimir Kozlov wrote: > The fix is reasonable. > Add comment `// remove dead node later` > Pass `PhaseIterGVN* igvn` instead of `eliminate_autobox(PhaseGVN* phase)` to simplify code. Note, `can_reshape == true` only for PhaseIterGVN. @vnkozlov Thanks for your review! All comments are fixed and pass Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. ------------- PR: https://git.openjdk.java.net/jdk/pull/3818 From igor.ignatyev at oracle.com Sun May 2 04:00:50 2021 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Sun, 2 May 2021 04:00:50 +0000 Subject: RFR: 8266074: Vtable-based CHA implementation In-Reply-To: References: <5GRZsKzjv3hRaV9LNgp2tQIaAmsUbIbGr4xGQDdkwus=.77bed390-12c6-4526-b8b1-c68f736b7c1c@github.com> <3Z7vXPRkLnBKfv84VVtXJI--I16HgEjydFtQflXkUBE=.44f5daab-6b9c-4e56-9aac-345b9b4d3a07@github.com> Message-ID: Hi David, I meant both: in a generic sense, meaning we won't properly document, advertise it or claim it as supported; and changing its type to EXPERIMENTAL, so it will be somewhat harder for people to switch it. -- Igor On May 1, 2021, at 3:03 PM, David Holmes > wrote: On 2/05/2021 1:39 am, Igor Ignatyev wrote: On Fri, 30 Apr 2021 22:10:02 GMT, Vladimir Kozlov > wrote: I'm fine with both approaches. Explicitly setting the flag looked to me more robust and clearer communicating the intent. But if you prefer `@requires`, I'll use it. Let hear @iignatev opinion. from my point of view, `@requires` is clearer and also eliminates "wasted" execution (if someone tries to run this test w/ `-XX:-UseVtableBasedCHA`), so I'd prefer if we use it. I have a more generic comment about `UseVtableBasedCHA`. I understand the desire to introduce a flag to switch back to the old implementation, but I'm somewhat concern that it adds a new dimension into configuration space that won't be covered by our existing tests (w/ the test which exercises interesting parts of the related code is inapplicable) and isn't part of our regular test configurations. Can we make it an experimental flag (w/ vtable-based CHA still being enabled by default)? this way, the quality bar for the old implementation will be somewhat lower, yet the end-users will still be able to return to the old implementation if it, for some reason, works better in their use-cases. Did you mean "experimental" in a generic sense or actually change it from DIAGNOSTIC to EXPERIMENTAL? If the latter then I don't agree this is an experimental flag, it is diagnostic. But either way the testing requirements are the same if we expect to tell end users to try this flag if they hit an problem - the flag has to be known to be functional, so we will have to expand the test coverage. Cheers, David ----- -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/3727 From iignatyev at openjdk.java.net Sun May 2 04:07:02 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sun, 2 May 2021 04:07:02 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: On Sat, 1 May 2021 10:48:37 GMT, Jie Fu wrote: > Hi all, > > Could you please review this small and trivial patch that adds `@requires vm.flagless` to `compiler/intrinsics/sha/cli ` tests that ignore VM flags? > > This change makes sense since it will fix some test failures when testing with extra VM flags. > > For example, the following three failures will be fixed when testing with UseAVX<2 on Intel CPUs. > > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnUnsupportedCPU.java > > > Thanks. > Best regards, > Jie IANAL. you should never change other companies' copyright. if you feel that your changes meet your criteria to add your copyright line, you should add it regardless of the existence/absence of Oracle copyright. in any case, the patch looks good to me. -- Igor ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3829 From jiefu at openjdk.java.net Sun May 2 04:14:02 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 2 May 2021 04:14:02 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: <29m15vBbywLQgcUOL7-FQ08PY0aQDVLjOHFcACm2RVQ=.af17875d-54fb-450b-9820-10c0a240fa0b@github.com> On Sun, 2 May 2021 04:04:18 GMT, Igor Ignatyev wrote: > IANAL. you should never change other companies' copyright. if you feel that your changes meet your criteria to add your copyright line, you should add it regardless of the existence/absence of Oracle copyright. > > in any case, the patch looks good to me. > > -- Igor Thanks @iignatev for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From david.holmes at oracle.com Sun May 2 05:54:47 2021 From: david.holmes at oracle.com (David Holmes) Date: Sun, 2 May 2021 15:54:47 +1000 Subject: RFR: 8266074: Vtable-based CHA implementation In-Reply-To: References: <5GRZsKzjv3hRaV9LNgp2tQIaAmsUbIbGr4xGQDdkwus=.77bed390-12c6-4526-b8b1-c68f736b7c1c@github.com> <3Z7vXPRkLnBKfv84VVtXJI--I16HgEjydFtQflXkUBE=.44f5daab-6b9c-4e56-9aac-345b9b4d3a07@github.com> Message-ID: On 2/05/2021 2:00 pm, Igor Ignatev wrote: > Hi David, > > I meant both: in a generic sense, meaning we won't properly document, > advertise it or claim it as supported; and changing its type to > EXPERIMENTAL, so it will be somewhat harder for people to switch it. Both forms are as hard to switch to as both must be unlocked. But semantically this is not an experimental flag IMO. Regardless, if the intent is to allow the flag to be used to restore legacy behaviour, then that legacy behaviour must be tested. We don't have to run thousands of tests with the flag disabled, just something representative enough to give us confidence that the code has not bit-rotted. Cheers, David > -- Igor > >> On May 1, 2021, at 3:03 PM, David Holmes > > wrote: >> >> On 2/05/2021 1:39 am, Igor Ignatyev wrote: >>> On Fri, 30 Apr 2021 22:10:02 GMT, Vladimir Kozlov >> > wrote: >>>>> I'm fine with both approaches. >>>>> >>>>> Explicitly setting the flag looked to me more robust and clearer >>>>> communicating the intent. But if you prefer `@requires`, I'll use it. >>>> >>>> Let hear @iignatev opinion. >>> from my point of view, `@requires` is clearer and also eliminates >>> "wasted" execution (if someone tries to run this test w/ >>> `-XX:-UseVtableBasedCHA`), so I'd prefer if we use it. >>> I have a more generic comment about `UseVtableBasedCHA`. I understand >>> the desire to introduce a flag to switch back to the old >>> implementation, but I'm somewhat concern that it adds a new dimension >>> into configuration space that won't be covered by our existing tests >>> (w/ the test which exercises interesting parts of the related code is >>> inapplicable) and isn't part of our regular test configurations. Can >>> we make it an experimental flag (w/ vtable-based CHA still being >>> enabled by default)? this way, the quality bar for the old >>> implementation will be somewhat lower, yet the end-users will still >>> be able to return to the old implementation if it, for some reason, >>> works better in their use-cases. >> >> Did you mean "experimental" in a generic sense or actually change it >> from DIAGNOSTIC to EXPERIMENTAL? If the latter then I don't agree this >> is an experimental flag, it is diagnostic. But either way the testing >> requirements are the same if we expect to tell end users to try this >> flag if they hit an problem - the flag has to be known to be >> functional, so we will have to expand the test coverage. >> >> Cheers, >> David >> ----- >> >>> -- Igor >>> ------------- >>> PR:https://git.openjdk.java.net/jdk/pull/3727 >>> > From chagedorn at openjdk.java.net Sun May 2 14:42:56 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Sun, 2 May 2021 14:42:56 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Fri, 30 Apr 2021 18:33:53 GMT, Igor Ignatyev wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix XCOMP cases from old framework and turn it into new debug flag -DIgnoreCompilerControls >> - Apply review comments: Added new Compiler annotation class for @DontCompile, changed C1 into C1_SIMPLE, refactored code for ExcludeRandom and FlipC1C2, added missing flag description in README, and some other smaller refactoring/renamings > > test/lib/jdk/test/lib/hotspot/ir_framework/TestFrameworkPrepareFlags.java line 92: > >> 90: + String.join(TestFramework.TEST_VM_FLAGS_DELIMITER, flags) >> 91: + System.lineSeparator() + TestFramework.TEST_VM_FLAGS_END; >> 92: TestFrameworkSocket.write(encoding, "flag encoding"); > > I don't see a need to use socket here, it will significantly simplify the code, the failure analysis, and reproducing if we just save prepared flags into a file w/ a well-known location (e.g. passed as an argument/property to `TestFrameworkPrepareFlags`), and when used command-line argument file (`@-file`) to pass these flags to the test VM. Okay, yes it would make it easier. But the socket can be kept for the communication between the test VM and the driver VM? ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From iignatyev at openjdk.java.net Sun May 2 15:55:01 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sun, 2 May 2021 15:55:01 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Sun, 2 May 2021 14:39:30 GMT, Christian Hagedorn wrote: >> test/lib/jdk/test/lib/hotspot/ir_framework/TestFrameworkPrepareFlags.java line 92: >> >>> 90: + String.join(TestFramework.TEST_VM_FLAGS_DELIMITER, flags) >>> 91: + System.lineSeparator() + TestFramework.TEST_VM_FLAGS_END; >>> 92: TestFrameworkSocket.write(encoding, "flag encoding"); >> >> I don't see a need to use socket here, it will significantly simplify the code, the failure analysis, and reproducing if we just save prepared flags into a file w/ a well-known location (e.g. passed as an argument/property to `TestFrameworkPrepareFlags`), and when used command-line argument file (`@-file`) to pass these flags to the test VM. > > Okay, yes it would make it easier. But the socket can be kept for the communication between the test VM and the driver VM? I don't have a strong opinion here. let's start w/ what you have now, we can always change it later if we find that its complexity doesn't bring much value. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From kvn at openjdk.java.net Sun May 2 17:45:53 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 2 May 2021 17:45:53 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: On Sat, 1 May 2021 10:48:37 GMT, Jie Fu wrote: > Hi all, > > Could you please review this small and trivial patch that adds `@requires vm.flagless` to `compiler/intrinsics/sha/cli ` tests that ignore VM flags? > > This change makes sense since it will fix some test failures when testing with extra VM flags. > > For example, the following three failures will be fixed when testing with UseAVX<2 on Intel CPUs. > > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnUnsupportedCPU.java > > > Thanks. > Best regards, > Jie Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From kvn at openjdk.java.net Sun May 2 17:48:01 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 2 May 2021 17:48:01 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: References: Message-ID: On Sun, 2 May 2021 00:50:14 GMT, Hui Shi wrote: >> This patch fix failure exposed by JDK-8264649. >> >> compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 in Compile::check_no_dead_use assertion. >> In LoadNode::eliminate_autobox, early "result" is dead after line 1450 but not added into PhaseGVN worklist for optimization. >> Its out_cnt is 0. If it isn't removed, will trigger assertion in Compile::check_no_dead_use. >> >> >> 1443 } else if (result->is_Add() && result->in(2)->is_Con() && >> 1444 result->in(1)->Opcode() == Op_LShiftX && >> 1445 result->in(1)->in(2) == phase->intcon(shift)) { >> 1446 // We can't do general optimization: ((X<> Z ==> X + (Y>>Z) >> 1447 // but for boxing cache access we know that X<> 1448 // (there is range check) so we do this optimizatrion by hand here. >> 1449 Node* add_con = new RShiftXNode(result->in(2), phase->intcon(shift)); >> --- result before is dead and might not removed >> 1450 result = new AddXNode(result->in(1)->in(1), phase->transform(add_con)); >> 1451 } else >> >> >> Detail analysis is in https://bugs.openjdk.java.net/browse/JDK-8265767 >> >> @mychris I have verified compiler/eliminateAutobox/TestIntBoxing.java on qemu, it failed with same assertion and now passes with this fix. Would you please help verify it on arm32 machine? >> >> Testing: >> - Passed Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. >> - compiler/eliminateAutobox/TestIntBoxing.java on arm32 release/fastdebug/slowdebug > > Hui Shi has updated the pull request incrementally with one additional commit since the last revision: > > Use PhaseIterGVN ptr as LoadNode::eliminate_autobox method parameter for simiplification and add comments for previous commit Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3818 From chagedorn at openjdk.java.net Sun May 2 21:03:03 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Sun, 2 May 2021 21:03:03 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Sun, 2 May 2021 15:51:43 GMT, Igor Ignatyev wrote: >> Okay, yes it would make it easier. But the socket can be kept for the communication between the test VM and the driver VM? > > I don't have a strong opinion here. let's start w/ what you have now, we can always change it later if we find that its complexity doesn't bring much value. Sounds good. I'll push an update with the changes tomorrow. Thanks for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From jiefu at openjdk.java.net Sun May 2 23:16:51 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 2 May 2021 23:16:51 GMT Subject: RFR: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: On Sun, 2 May 2021 17:43:20 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> Could you please review this small and trivial patch that adds `@requires vm.flagless` to `compiler/intrinsics/sha/cli ` tests that ignore VM flags? >> >> This change makes sense since it will fix some test failures when testing with extra VM flags. >> >> For example, the following three failures will be fixed when testing with UseAVX<2 on Intel CPUs. >> >> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java >> compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java >> compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnUnsupportedCPU.java >> >> >> Thanks. >> Best regards, >> Jie > > Marked as reviewed by kvn (Reviewer). Thanks @vnkozlov . ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From jiefu at openjdk.java.net Sun May 2 23:16:52 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 2 May 2021 23:16:52 GMT Subject: Integrated: 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags In-Reply-To: References: Message-ID: <6Ag5lRlisZEeq18NbRu0SdoXWVQ0ZTNcSIQfL2b8ziQ=.6ddffb7c-bd19-4121-97a5-cac33a42468c@github.com> On Sat, 1 May 2021 10:48:37 GMT, Jie Fu wrote: > Hi all, > > Could you please review this small and trivial patch that adds `@requires vm.flagless` to `compiler/intrinsics/sha/cli ` tests that ignore VM flags? > > This change makes sense since it will fix some test failures when testing with extra VM flags. > > For example, the following three failures will be fixed when testing with UseAVX<2 on Intel CPUs. > > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java > compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnUnsupportedCPU.java > > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 7e30130e Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/7e30130e354ebfed14617effd2a517ab2f4140a5 Stats: 14 lines in 12 files changed: 14 ins; 0 del; 0 mod 8266401: mark hotspot compiler/intrinsics/sha/cli tests which ignore VM flags Reviewed-by: iignatyev, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3829 From thartmann at openjdk.java.net Mon May 3 06:26:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 3 May 2021 06:26:54 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v2] In-Reply-To: References: Message-ID: On Tue, 27 Apr 2021 15:57:07 GMT, Roland Westrelin wrote: >> Sinking data nodes out of a loop when all uses are out of a loop has >> several issues that this attempts to fix. >> >> 1- Only non control uses are considered which makes little sense (why >> not sink if the data node is an argument to a call or a returned >> value?) >> >> 2- Sinking of Loads is broken because of the handling of >> anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control >> in the loop because it takes all uses into account. >> >> 3- For data nodes for which a control edge can't be set, commoning of >> clones back in the loop is prevented with: >> _igvn._worklist.yank(x); >> which gives no guarantee >> >> This patch tries to address all issues: >> >> 1- it looks at all uses, not only non control uses >> >> 2- anti-dependences are computed for each use independently >> >> 3- Cast nodes are used to pin clones out of loop >> >> >> 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl() >> logic. While working on this, I noticed a bug in anti-dependence >> analysis: when the use is a cfg node, the code sometimes looks at uses >> of the memory state of the cfg. The logic uses the use of the cfg >> which is a projection of adr_type identical to the cfg. It should >> instead look at the use of the memory projection. >> >> The existing logic for sinking loads calls clear_dom_lca_tags() for >> every load which seems like quite a waste. I added a >> _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By >> incrementing _dom_lca_tags_round, new tags that don't conflict with >> existing ones are produced and there's no need for >> clear_dom_lca_tags(). >> >> For anti-dependence analysis to return a correct result, early control >> of the load is needed. The only way to get it at this stage, AFAICT, >> is to compute it by following the load's input until a pinned node is >> reached. >> >> The existing logic pins cloned nodes next to their use. The logic I >> propose pins them right out of the loop. This could possibly avoid >> some redundant clones. It also makes some special handling for corner >> cases with loop strip mining useless. >> >> For 3-, I added extra Cast nodes for float types. If a chain of data >> nodes are sunk, the new logic tries to keep a single Cast for the >> entire chain rather than one Cast per node. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - CastVV > - Merge branch 'master' into JDK-8252372 > - extra comments > - fix Thanks, testing looks good now. I need some more time to review this though. ------------- PR: https://git.openjdk.java.net/jdk/pull/3689 From cgo at openjdk.java.net Mon May 3 06:33:54 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Mon, 3 May 2021 06:33:54 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: References: Message-ID: <7eMVDgJZjA-pHJN833ipbKKBaydaaNDOWoEvbFZdTOE=.69b0dd41-2c6d-4b3e-8b51-37b55f94778b@github.com> On Sun, 2 May 2021 00:50:14 GMT, Hui Shi wrote: >> This patch fix failure exposed by JDK-8264649. >> >> compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 in Compile::check_no_dead_use assertion. >> In LoadNode::eliminate_autobox, early "result" is dead after line 1450 but not added into PhaseGVN worklist for optimization. >> Its out_cnt is 0. If it isn't removed, will trigger assertion in Compile::check_no_dead_use. >> >> >> 1443 } else if (result->is_Add() && result->in(2)->is_Con() && >> 1444 result->in(1)->Opcode() == Op_LShiftX && >> 1445 result->in(1)->in(2) == phase->intcon(shift)) { >> 1446 // We can't do general optimization: ((X<> Z ==> X + (Y>>Z) >> 1447 // but for boxing cache access we know that X<> 1448 // (there is range check) so we do this optimizatrion by hand here. >> 1449 Node* add_con = new RShiftXNode(result->in(2), phase->intcon(shift)); >> --- result before is dead and might not removed >> 1450 result = new AddXNode(result->in(1)->in(1), phase->transform(add_con)); >> 1451 } else >> >> >> Detail analysis is in https://bugs.openjdk.java.net/browse/JDK-8265767 >> >> @mychris I have verified compiler/eliminateAutobox/TestIntBoxing.java on qemu, it failed with same assertion and now passes with this fix. Would you please help verify it on arm32 machine? >> >> Testing: >> - Passed Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. >> - compiler/eliminateAutobox/TestIntBoxing.java on arm32 release/fastdebug/slowdebug > > Hui Shi has updated the pull request incrementally with one additional commit since the last revision: > > Use PhaseIterGVN ptr as LoadNode::eliminate_autobox method parameter for simiplification and add comments for previous commit Thanks for looking into this issue. I am currently testing on an ARMv7-A target, and `compiler/eliminateAutobox/TestIntBoxing.java` is passing now ??. The whole hotspot tier1 suite will take some time, so I will report back later today, or tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/3818 From jbhateja at openjdk.java.net Mon May 3 06:51:29 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 3 May 2021 06:51:29 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v3] In-Reply-To: References: Message-ID: > Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- > > vec1 = lanewise(VectorOperators.LSHL, n) > vec2 = lanewise(VectorOperators.LSHR, n) > res = lanewise(VectorOperations.OR, vec1 , vec2) > > This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. > > AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. > > Please find below the performance data for included JMH benchmark. > Machine: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade lake Server) > > `` > > Benchmark | (TESTSIZE) | Shift | Baseline (ops/ms) | Withopt (ops/ms) | Gain % > -- | -- | -- | -- | -- | -- > RotateBenchmark.testRotateLeftI | (size) | (shift) | 7384.747 | 7706.652 | 4.36 > RotateBenchmark.testRotateLeftI | 64 | 11 | 3723.305 | 3816.968 | 2.52 > RotateBenchmark.testRotateLeftI | 128 | 11 | 1811.521 | 1966.05 | 8.53 > RotateBenchmark.testRotateLeftI | 256 | 11 | 7133.296 | 7715.047 | 8.16 > RotateBenchmark.testRotateLeftI | 64 | 21 | 3612.144 | 3886.225 | 7.59 > RotateBenchmark.testRotateLeftI | 128 | 21 | 1815.422 | 1962.753 | 8.12 > RotateBenchmark.testRotateLeftI | 256 | 21 | 7216.353 | 7677.165 | 6.39 > RotateBenchmark.testRotateLeftI | 64 | 31 | 3602.008 | 3892.297 | 8.06 > RotateBenchmark.testRotateLeftI | 128 | 31 | 1882.163 | 1958.887 | 4.08 > RotateBenchmark.testRotateLeftI | 256 | 31 | 11819.443 | 11912.864 | 0.79 > RotateBenchmark.testRotateLeftI | 64 | 11 | 5978.475 | 6060.189 | 1.37 > RotateBenchmark.testRotateLeftI | 128 | 11 | 2965.179 | 3060.969 | 3.23 > RotateBenchmark.testRotateLeftI | 256 | 11 | 11479.579 | 11684.148 | 1.78 > RotateBenchmark.testRotateLeftI | 64 | 21 | 5904.903 | 6094.409 | 3.21 > RotateBenchmark.testRotateLeftI | 128 | 21 | 2969.879 | 3074.1 | 3.51 > RotateBenchmark.testRotateLeftI | 256 | 21 | 11531.654 | 12155.954 | 5.41 > RotateBenchmark.testRotateLeftI | 64 | 31 | 5730.918 | 6112.514 | 6.66 > RotateBenchmark.testRotateLeftI | 128 | 31 | 2937.19 | 2976.297 | 1.33 > RotateBenchmark.testRotateLeftI | 256 | 31 | 16159.184 | 16459.462 | 1.86 > RotateBenchmark.testRotateLeftI | 64 | 11 | 8154.982 | 8396.089 | 2.96 > RotateBenchmark.testRotateLeftI | 128 | 11 | 4142.224 | 4292.049 | 3.62 > RotateBenchmark.testRotateLeftI | 256 | 11 | 15958.154 | 16163.518 | 1.29 > RotateBenchmark.testRotateLeftI | 64 | 21 | 8098.805 | 8504.279 | 5.01 > RotateBenchmark.testRotateLeftI | 128 | 21 | 4137.598 | 4314.868 | 4.28 > RotateBenchmark.testRotateLeftI | 256 | 21 | 16201.666 | 15992.958 | -1.29 > RotateBenchmark.testRotateLeftI | 64 | 31 | 8027.169 | 8484.379 | 5.70 > RotateBenchmark.testRotateLeftI | 128 | 31 | 4146.29 | 4039.681 | -2.57 > RotateBenchmark.testRotateLeftL | 256 | 31 | 3566.176 | 3805.248 | 6.70 > RotateBenchmark.testRotateLeftL | 64 | 11 | 1820.219 | 1962.866 | 7.84 > RotateBenchmark.testRotateLeftL | 128 | 11 | 917.085 | 1007.334 | 9.84 > RotateBenchmark.testRotateLeftL | 256 | 11 | 3592.139 | 3973.698 | 10.62 > RotateBenchmark.testRotateLeftL | 64 | 21 | 1827.63 | 1999.711 | 9.42 > RotateBenchmark.testRotateLeftL | 128 | 21 | 907.104 | 1002.997 | 10.57 > RotateBenchmark.testRotateLeftL | 256 | 21 | 3780.962 | 3873.489 | 2.45 > RotateBenchmark.testRotateLeftL | 64 | 31 | 1830.121 | 1955.63 | 6.86 > RotateBenchmark.testRotateLeftL | 128 | 31 | 891.411 | 982.138 | 10.18 > RotateBenchmark.testRotateLeftL | 256 | 31 | 5890.544 | 6100.594 | 3.57 > RotateBenchmark.testRotateLeftL | 64 | 11 | 2984.329 | 3021.971 | 1.26 > RotateBenchmark.testRotateLeftL | 128 | 11 | 1485.109 | 1527.689 | 2.87 > RotateBenchmark.testRotateLeftL | 256 | 11 | 5903.411 | 6083.775 | 3.06 > RotateBenchmark.testRotateLeftL | 64 | 21 | 2925.37 | 3050.958 | 4.29 > RotateBenchmark.testRotateLeftL | 128 | 21 | 1486.432 | 1537.155 | 3.41 > RotateBenchmark.testRotateLeftL | 256 | 21 | 5853.721 | 6000.682 | 2.51 > RotateBenchmark.testRotateLeftL | 64 | 31 | 2896.116 | 3072.783 | 6.10 > RotateBenchmark.testRotateLeftL | 128 | 31 | 1483.132 | 1546.588 | 4.28 > RotateBenchmark.testRotateLeftL | 256 | 31 | 8059.206 | 8218.047 | 1.97 > RotateBenchmark.testRotateLeftL | 64 | 11 | 4022.416 | 4195.52 | 4.30 > RotateBenchmark.testRotateLeftL | 128 | 11 | 2084.296 | 2068.238 | -0.77 > RotateBenchmark.testRotateLeftL | 256 | 11 | 7971.832 | 8172.819 | 2.52 > RotateBenchmark.testRotateLeftL | 64 | 21 | 4032.036 | 4344.469 | 7.75 > RotateBenchmark.testRotateLeftL | 128 | 21 | 2068.957 | 2138.685 | 3.37 > RotateBenchmark.testRotateLeftL | 256 | 21 | 8140.63 | 8003.283 | -1.69 > RotateBenchmark.testRotateLeftL | 64 | 31 | 4088.621 | 4296.091 | 5.07 > RotateBenchmark.testRotateLeftL | 128 | 31 | 2007.753 | 2088.455 | 4.02 > RotateBenchmark.testRotateRightI | 256 | 31 | 7358.793 | 7548.976 | 2.58 > RotateBenchmark.testRotateRightI | 64 | 11 | 3648.868 | 3897.47 | 6.81 > RotateBenchmark.testRotateRightI | 128 | 11 | 1862.73 | 1969.964 | 5.76 > RotateBenchmark.testRotateRightI | 256 | 11 | 7268.806 | 7790.588 | 7.18 > RotateBenchmark.testRotateRightI | 64 | 21 | 3577.79 | 3979.675 | 11.23 > RotateBenchmark.testRotateRightI | 128 | 21 | 1773.243 | 1921.088 | 8.34 > RotateBenchmark.testRotateRightI | 256 | 21 | 7084.974 | 7609.912 | 7.41 > RotateBenchmark.testRotateRightI | 64 | 31 | 3688.781 | 3909.65 | 5.99 > RotateBenchmark.testRotateRightI | 128 | 31 | 1845.978 | 1928.316 | 4.46 > RotateBenchmark.testRotateRightI | 256 | 31 | 11463.228 | 12179.833 | 6.25 > RotateBenchmark.testRotateRightI | 64 | 11 | 5678.052 | 6028.573 | 6.17 > RotateBenchmark.testRotateRightI | 128 | 11 | 2990.419 | 3070.409 | 2.67 > RotateBenchmark.testRotateRightI | 256 | 11 | 11780.283 | 12105.261 | 2.76 > RotateBenchmark.testRotateRightI | 64 | 21 | 5827.8 | 6020.208 | 3.30 > RotateBenchmark.testRotateRightI | 128 | 21 | 2904.852 | 3047.154 | 4.90 > RotateBenchmark.testRotateRightI | 256 | 21 | 11359.146 | 12060.401 | 6.17 > RotateBenchmark.testRotateRightI | 64 | 31 | 5823.207 | 6079.82 | 4.41 > RotateBenchmark.testRotateRightI | 128 | 31 | 2984.484 | 3045.719 | 2.05 > RotateBenchmark.testRotateRightI | 256 | 31 | 16200.504 | 16376.475 | 1.09 > RotateBenchmark.testRotateRightI | 64 | 11 | 8118.399 | 8315.407 | 2.43 > RotateBenchmark.testRotateRightI | 128 | 11 | 4130.745 | 4092.588 | -0.92 > RotateBenchmark.testRotateRightI | 256 | 11 | 15842.168 | 16469.119 | 3.96 > RotateBenchmark.testRotateRightI | 64 | 21 | 7855.164 | 8188.913 | 4.25 > RotateBenchmark.testRotateRightI | 128 | 21 | 4114.378 | 4035.56 | -1.92 > RotateBenchmark.testRotateRightI | 256 | 21 | 15636.117 | 16289.632 | 4.18 > RotateBenchmark.testRotateRightI | 64 | 31 | 8108.067 | 7996.517 | -1.38 > RotateBenchmark.testRotateRightI | 128 | 31 | 3997.547 | 4153.58 | 3.90 > RotateBenchmark.testRotateRightL | 256 | 31 | 3685.99 | 3814.384 | 3.48 > RotateBenchmark.testRotateRightL | 64 | 11 | 1787.875 | 1916.541 | 7.20 > RotateBenchmark.testRotateRightL | 128 | 11 | 940.141 | 990.383 | 5.34 > RotateBenchmark.testRotateRightL | 256 | 11 | 3745.968 | 3920.667 | 4.66 > RotateBenchmark.testRotateRightL | 64 | 21 | 1877.94 | 1998.072 | 6.40 > RotateBenchmark.testRotateRightL | 128 | 21 | 933.536 | 1004.61 | 7.61 > RotateBenchmark.testRotateRightL | 256 | 21 | 3744.763 | 3947.427 | 5.41 > RotateBenchmark.testRotateRightL | 64 | 31 | 1864.818 | 1978.277 | 6.08 > RotateBenchmark.testRotateRightL | 128 | 31 | 906.965 | 998.692 | 10.11 > RotateBenchmark.testRotateRightL | 256 | 31 | 5910.469 | 6062.429 | 2.57 > RotateBenchmark.testRotateRightL | 64 | 11 | 2914.64 | 3033.127 | 4.07 > RotateBenchmark.testRotateRightL | 128 | 11 | 1491.344 | 1543.936 | 3.53 > RotateBenchmark.testRotateRightL | 256 | 11 | 5801.818 | 6098.892 | 5.12 > RotateBenchmark.testRotateRightL | 64 | 21 | 2881.328 | 3089.547 | 7.23 > RotateBenchmark.testRotateRightL | 128 | 21 | 1485.969 | 1526.231 | 2.71 > RotateBenchmark.testRotateRightL | 256 | 21 | 5783.495 | 5957.649 | 3.01 > RotateBenchmark.testRotateRightL | 64 | 31 | 3008.182 | 3026.323 | 0.60 > RotateBenchmark.testRotateRightL | 128 | 31 | 1464.566 | 1546.825 | 5.62 > RotateBenchmark.testRotateRightL | 256 | 31 | 8208.124 | 8361.437 | 1.87 > RotateBenchmark.testRotateRightL | 64 | 11 | 4062.465 | 4319.412 | 6.32 > RotateBenchmark.testRotateRightL | 128 | 11 | 2029.995 | 2086.497 | 2.78 > RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11 > RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47 > RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10 > RotateBenchmark.testRotateRightL | 256 | 21 | 8155.015 | 8175.792 | 0.25 > RotateBenchmark.testRotateRightL | 64 | 31 | 3960.629 | 4263.922 | 7.66 > RotateBenchmark.testRotateRightL | 128 | 31 | 1996.862 | 2055.486 | 2.94 > > `` Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266054: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3720/files - new: https://git.openjdk.java.net/jdk/pull/3720/files/eee407b0..f7945bff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=01-02 Stats: 1243 lines in 45 files changed: 297 ins; 693 del; 253 mod Patch: https://git.openjdk.java.net/jdk/pull/3720.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3720/head:pull/3720 PR: https://git.openjdk.java.net/jdk/pull/3720 From jbhateja at openjdk.java.net Mon May 3 06:51:31 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 3 May 2021 06:51:31 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v3] In-Reply-To: References: Message-ID: On Tue, 27 Apr 2021 18:43:11 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266054: Review comments resolution. > > I noticed the tests are only updated for int and long, is that intentional? The HotSpot changes in some cases seem to imply all integral types are supported via the use of `is_integral_type`, contradicted by the use of `is_subword_type`. > > I would recommend trying to leverage Integer/Long.rotateLeft/Right implementations. They are not available for byte/short, so lets add specific methods in those cases, that should make the Java op implementation clearer. Hi @PaulSandoz , thanks your comments have been addressed. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720 From jbhateja at openjdk.java.net Mon May 3 06:51:37 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 3 May 2021 06:51:37 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v2] In-Reply-To: References: Message-ID: <7W-IJ7OChF3wVhl7g1JNB6m-1GPmNxdJHlFx36HjqxI=.e148a846-29dc-4a27-b15f-3a5898e6e97d@github.com> On Fri, 30 Apr 2021 15:44:41 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - 8266054: Review comments resolution. >> - Merge http://github.com/openjdk/jdk into JDK-8266054 >> - 8266054: Changing gen-src.sh file permissions >> - 8266054: VectorAPI rotate operation optimization > > src/hotspot/cpu/x86/x86.ad line 1652: > >> 1650: case Op_RotateRightV: >> 1651: case Op_RotateLeftV: >> 1652: if (is_subword_type(bt)) { > > Does that have the effect of not intrinsifying for `byte` or `short`? Yes, it makes sure that intrinsification is based on Shifts and Or operations instead of Rotation operation. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720 From jbhateja at openjdk.java.net Mon May 3 07:08:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 3 May 2021 07:08:53 GMT Subject: RFR: 8265126: unified handling for VectorMask object re-materialization during de-optimization (re-submit) [v2] In-Reply-To: References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: On Fri, 30 Apr 2021 19:12:14 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge http://github.com/openjdk/jdk into JDK-8265126 >> - 8265126:[REDO] unified handling for VectorMask object re-materialization during de-optimization > > src/hotspot/share/opto/vector.cpp line 244: > >> 242: SafePointNode* sfpt = safepoints.pop()->as_SafePoint(); >> 243: >> 244: ciInstanceKlass* iklass = vec_box->box_type()->klass()->as_instance_klass(); > > Why do you remove that code? It was added to avoid a crash with -XX:+PrintAssembly. vec_box is a loop invariant the check has just been moved out of the loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/3721 From vlivanov at openjdk.java.net Mon May 3 09:13:54 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 09:13:54 GMT Subject: RFR: 8265126: unified handling for VectorMask object re-materialization during de-optimization (re-submit) [v2] In-Reply-To: References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: On Fri, 30 Apr 2021 13:09:16 GMT, Jatin Bhateja wrote: >> Following flow describes object reconstruction for de-optimization:- >> >> 1. PhaseVector::scalarize_vbox_node() creates SafePointScalarObjectNode to captures the box type information, also it connects to node holding the boxed value. >> 2. During code emit phase (PhaseOutput) C2 process above information to dumps ObjectValue holding the box information and LocationValue to holding the value information into ScopeDescriptor corresponding to Safepoint PC. >> 3. De-optimization blobs dump the value held in registers to the stack locations using RegisterSave::save_live_registers() and a mapping b/w register and its stack location is added to RegisterMap. >> 4. During de-optimization, compiled frame objects are re-allocated using identity information held in ObjectValue and their fields are initialized using values held in the stack locations accessed through register-stack mappings. >> >> By inserting a VectorStoreMaskNode before stitching the mask holding node to Safepoint we make sure that value held in opmask/vector register is transferred to a byte vector. Thus rest of the flow works as it is, stack location will hold the value in the form of a byte array irrespective of the box shape. >> >> tier1-tier3 regressions are clean with UseAVX=2/3. > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge http://github.com/openjdk/jdk into JDK-8265126 > - 8265126:[REDO] unified handling for VectorMask object re-materialization during de-optimization Test results (hs-tier1 - hs-tier5) are clean. ------------- PR: https://git.openjdk.java.net/jdk/pull/3721 From vlivanov at openjdk.java.net Mon May 3 09:50:57 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 09:50:57 GMT Subject: Integrated: 8266388: C2: Improve constant ShiftCntV on x86 In-Reply-To: References: Message-ID: <_apJp_NhFr0X7AIQhc8-SO2JQITlNOD99Jw5-Bm6rWE=.63815d9f-fcc2-4941-9ea9-65b38ba4720d@github.com> On Fri, 30 Apr 2021 19:18:37 GMT, Vladimir Ivanov wrote: > Clone ShiftCntV w/ a constant input during matching on x86. > It enables `vshiftI_imm`/`vshiftL_imm` when ShiftCntV is shared. > > Testing: > - [x] hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: b42d4969 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/b42d4969b1753e717a66218fd465243dfeccd455 Stats: 6 lines in 3 files changed: 4 ins; 0 del; 2 mod 8266388: C2: Improve constant ShiftCntV on x86 Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3823 From vlivanov at openjdk.java.net Mon May 3 09:50:57 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 09:50:57 GMT Subject: RFR: 8266388: C2: Improve constant ShiftCntV on x86 In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 19:18:37 GMT, Vladimir Ivanov wrote: > Clone ShiftCntV w/ a constant input during matching on x86. > It enables `vshiftI_imm`/`vshiftL_imm` when ShiftCntV is shared. > > Testing: > - [x] hs-tier1 - hs-tier4 Thanks for the review, Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/3823 From cgo at openjdk.java.net Mon May 3 12:58:51 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Mon, 3 May 2021 12:58:51 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: <6lQmNU2HwKgdQFqkFgbSV2mOHOwFzlV9HanRcPZOHvM=.ea6ea467-1edd-4e3f-a960-09e1e07eea67@github.com> References: <6lQmNU2HwKgdQFqkFgbSV2mOHOwFzlV9HanRcPZOHvM=.ea6ea467-1edd-4e3f-a960-09e1e07eea67@github.com> Message-ID: On Sun, 2 May 2021 00:51:35 GMT, Hui Shi wrote: >> The fix is reasonable. >> Add comment `// remove dead node later` >> Pass `PhaseIterGVN* igvn` instead of `eliminate_autobox(PhaseGVN* phase)` to simplify code. Note, `can_reshape == true` only for PhaseIterGVN. > >> The fix is reasonable. >> Add comment `// remove dead node later` >> Pass `PhaseIterGVN* igvn` instead of `eliminate_autobox(PhaseGVN* phase)` to simplify code. Note, `can_reshape == true` only for PhaseIterGVN. > > @vnkozlov Thanks for your review! All comments are fixed and pass Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. Hi @huishi-hs, hotspot tier1 passes on my ARMv7-A targets. ------------- PR: https://git.openjdk.java.net/jdk/pull/3818 From vlivanov at openjdk.java.net Mon May 3 13:09:08 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 13:09:08 GMT Subject: RFR: 8266074: Vtable-based CHA implementation [v2] In-Reply-To: References: Message-ID: > As of now, Class Hierarchy Analysis (CHA) employs an approximate algorithm to enumerate all non-abstract methods in a class hierarchy. > > It served quite well for many years, but it accumulated significant complexity > to support different corner cases over time and inevitable evolution of the JVM > stretched the whole approach way too much (to the point where it become almost > impossible to extend the analysis any further). > > It turns out the root problem is the decision to reimplement method resolution > and method selection logic from scratch and to perform it on JVM internal > representation. It makes it very hard to reason about correctness and the > implementation becomes sensitive to changes in internal representation. > > So, the main motivation for the redesign is twofold: > * reduce maintenance burden and increase confidence in the code; > * unlock some long-awaited enhancements. > > Though I did experiment with relaxing existing constraints (e.g., enable default method support), > any possible enhancements are deliberately kept out of scope for the current PR. > (It does deliver a bit of minor enhancements front as the changes in > compiler/cha/StrengthReduceInterfaceCall.java manifest, but it's a side effect > of the other changes and was not the goal of the current work.) > > Proposed implementation (`LinkedConcreteMethodFinder`) mimics method invocation > and relies on vtable/itable information to detect target method for every > subclass it visits. It removes all the complexity associated with method > resolution and method selection logic and leaves only essential logic to prepare for method selection. > > Vtables are filled during class linkage, so new logic doesn't work on not yet linked classed. > Instead of supporting not yet linked case, it is simply ignored. It is safe to > skip them (treat as "effectively non-concrete") since it is guaranteed there > are no instances created yet. But it requires VM to check dependencies once a > class is linked. > > I ended up with 2 separate dependency validation passes (when class is loaded > and when it is linked). To avoid duplicated work, only dependencies > which may be affected by class initialization state change > (`unique_concrete_method_4`) are visited. > > (I experimented with merging passes into a single pass (delay the pass until > linkage is over), but it severely affected other class-related dependencies and > relevant optimizations.code.) > > Compiler Interface (CI) is changed to require users to provide complete information about the call site being analyzed. > > Old implementation is kept intact for now (will be removed later) to: > - JVMCI hasn't been migrated to the new implementation yet; > - enable verification that 2 implementations (old and new) agree on the results; > - temporarily keep an option to revert to the original implementation in case any regressions show up. > > Testing: > - [x] hs-tier1 - hs-tier9 > - [x] hs-tier1 - hs-tier4 w/ `-XX:-UseVtableBasedCHA` > - [x] performance testing > > Thanks! Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: - Improve comments - Use vm.opt.final.UseVtableBasedCHA ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3727/files - new: https://git.openjdk.java.net/jdk/pull/3727/files/3063f97d..02f615b2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3727&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3727&range=00-01 Stats: 14 lines in 2 files changed: 7 ins; 2 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/3727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3727/head:pull/3727 PR: https://git.openjdk.java.net/jdk/pull/3727 From hshi at openjdk.java.net Mon May 3 13:17:53 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Mon, 3 May 2021 13:17:53 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: References: <6lQmNU2HwKgdQFqkFgbSV2mOHOwFzlV9HanRcPZOHvM=.ea6ea467-1edd-4e3f-a960-09e1e07eea67@github.com> Message-ID: <78Q7kcQpgsHB3NBGH4WMNTzFCl2aZ1TYqxz_R6d9k9s=.3c4ce3d4-9363-4d15-bdab-e66c292ebadb@github.com> On Mon, 3 May 2021 12:56:03 GMT, Christoph G?ttschkes wrote: >>> The fix is reasonable. >>> Add comment `// remove dead node later` >>> Pass `PhaseIterGVN* igvn` instead of `eliminate_autobox(PhaseGVN* phase)` to simplify code. Note, `can_reshape == true` only for PhaseIterGVN. >> >> @vnkozlov Thanks for your review! All comments are fixed and pass Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. > > Hi @huishi-hs, hotspot tier1 passes on my ARMv7-A targets. @mychris Thanks for help verification! @vnkozlov Thanks for review! Does this patch need a second review before integrate? ------------- PR: https://git.openjdk.java.net/jdk/pull/3818 From goetz at openjdk.java.net Mon May 3 13:26:52 2021 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Mon, 3 May 2021 13:26:52 GMT Subject: RFR: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind In-Reply-To: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> References: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> Message-ID: On Thu, 22 Apr 2021 18:58:28 GMT, Martin Doerr wrote: > PPC64 and s390 have DecodeN implementations which use a MachTemp input. When LCM hoists the DecodeN, the MachTemp nodes reside in the old block, but should get hoisted together with the DecodeN node. > Same is true for load Base input which exists on s390 for example. Unfortunately, that's just a platform specific MachNode which is not nicely recognizable in LCM. Hi, I think temp must go into a block that dominates both, tempb and block. Do you know block always dominates tempb (after the if?) Then you could add assert(block->dominates(tempb), "find legal posittion") Best regards, Goetz. ------------- PR: https://git.openjdk.java.net/jdk/pull/3637 From vlivanov at openjdk.java.net Mon May 3 13:46:11 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 13:46:11 GMT Subject: RFR: 8266074: Vtable-based CHA implementation [v3] In-Reply-To: References: Message-ID: > As of now, Class Hierarchy Analysis (CHA) employs an approximate algorithm to enumerate all non-abstract methods in a class hierarchy. > > It served quite well for many years, but it accumulated significant complexity > to support different corner cases over time and inevitable evolution of the JVM > stretched the whole approach way too much (to the point where it become almost > impossible to extend the analysis any further). > > It turns out the root problem is the decision to reimplement method resolution > and method selection logic from scratch and to perform it on JVM internal > representation. It makes it very hard to reason about correctness and the > implementation becomes sensitive to changes in internal representation. > > So, the main motivation for the redesign is twofold: > * reduce maintenance burden and increase confidence in the code; > * unlock some long-awaited enhancements. > > Though I did experiment with relaxing existing constraints (e.g., enable default method support), > any possible enhancements are deliberately kept out of scope for the current PR. > (It does deliver a bit of minor enhancements front as the changes in > compiler/cha/StrengthReduceInterfaceCall.java manifest, but it's a side effect > of the other changes and was not the goal of the current work.) > > Proposed implementation (`LinkedConcreteMethodFinder`) mimics method invocation > and relies on vtable/itable information to detect target method for every > subclass it visits. It removes all the complexity associated with method > resolution and method selection logic and leaves only essential logic to prepare for method selection. > > Vtables are filled during class linkage, so new logic doesn't work on not yet linked classed. > Instead of supporting not yet linked case, it is simply ignored. It is safe to > skip them (treat as "effectively non-concrete") since it is guaranteed there > are no instances created yet. But it requires VM to check dependencies once a > class is linked. > > I ended up with 2 separate dependency validation passes (when class is loaded > and when it is linked). To avoid duplicated work, only dependencies > which may be affected by class initialization state change > (`unique_concrete_method_4`) are visited. > > (I experimented with merging passes into a single pass (delay the pass until > linkage is over), but it severely affected other class-related dependencies and > relevant optimizations.code.) > > Compiler Interface (CI) is changed to require users to provide complete information about the call site being analyzed. > > Old implementation is kept intact for now (will be removed later) to: > - JVMCI hasn't been migrated to the new implementation yet; > - enable verification that 2 implementations (old and new) agree on the results; > - temporarily keep an option to revert to the original implementation in case any regressions show up. > > Testing: > - [x] hs-tier1 - hs-tier9 > - [x] hs-tier1 - hs-tier4 w/ `-XX:-UseVtableBasedCHA` > - [x] performance testing > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Improve comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3727/files - new: https://git.openjdk.java.net/jdk/pull/3727/files/02f615b2..11276e26 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3727&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3727&range=01-02 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3727/head:pull/3727 PR: https://git.openjdk.java.net/jdk/pull/3727 From vladimir.x.ivanov at oracle.com Mon May 3 13:51:29 2021 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 3 May 2021 16:51:29 +0300 Subject: RFR: 8266074: Vtable-based CHA implementation In-Reply-To: References: <5GRZsKzjv3hRaV9LNgp2tQIaAmsUbIbGr4xGQDdkwus=.77bed390-12c6-4526-b8b1-c68f736b7c1c@github.com> <3Z7vXPRkLnBKfv84VVtXJI--I16HgEjydFtQflXkUBE=.44f5daab-6b9c-4e56-9aac-345b9b4d3a07@github.com> Message-ID: Thanks for your feedback, Igor and David. >> I meant both: in a generic sense, meaning we won't properly document, >> advertise it or claim it as supported; and changing its type to >> EXPERIMENTAL, so it will be somewhat harder for people to switch it. > > Both forms are as hard to switch to as both must be unlocked. But > semantically this is not an experimental flag IMO. I agree. > Regardless, if the intent is to allow the flag to be used to restore > legacy behaviour, then that legacy behaviour must be tested. We don't > have to run thousands of tests with the flag disabled, just something > representative enough to give us confidence that the code has not > bit-rotted. Yes, it makes perfect sense to test that -XX:-UseVtableBasedCHA is usable. I'd like to point out that new implementation exercise old implementation [1] to verify that there's an agreement on the result. So, maybe just a smoke test is enough here. Best regards, Vladimir Ivanov [1] src/hotspot/share/code/dependencies.cpp: Method* Dependencies::find_unique_concrete_method(InstanceKlass* ctxk, Method* m, Klass* resolved_klass, Method* resolved_method) { ... // Old CHA conservatively reports concrete methods in abstract classes // irrespective of whether they have concrete subclasses or not. #ifdef ASSERT Klass* uniqp = NULL; Method* uniqm = Dependencies::find_unique_concrete_method(ctxk, m, &uniqp); assert(uniqm == NULL || uniqm == fm || uniqm->method_holder()->is_abstract() || (fm == NULL && uniqm != NULL && uniqp != NULL && !InstanceKlass::cast(uniqp)->is_linked()), "sanity"); #endif // ASSERT >>> On May 1, 2021, at 3:03 PM, David Holmes >> > wrote: >>> >>> On 2/05/2021 1:39 am, Igor Ignatyev wrote: >>>> On Fri, 30 Apr 2021 22:10:02 GMT, Vladimir Kozlov >>> > wrote: >>>>>> I'm fine with both approaches. >>>>>> >>>>>> Explicitly setting the flag looked to me more robust and clearer >>>>>> communicating the intent. But if you prefer `@requires`, I'll use it. >>>>> >>>>> Let hear @iignatev opinion. >>>> from my point of view, `@requires` is clearer and also eliminates >>>> "wasted" execution (if someone tries to run this test w/ >>>> `-XX:-UseVtableBasedCHA`), so I'd prefer if we use it. >>>> I have a more generic comment about `UseVtableBasedCHA`. I >>>> understand the desire to introduce a flag to switch back to the old >>>> implementation, but I'm somewhat concern that it adds a new >>>> dimension into configuration space that won't be covered by our >>>> existing tests (w/ the test which exercises interesting parts of the >>>> related code is inapplicable) and isn't part of our regular test >>>> configurations. Can we make it an experimental flag (w/ vtable-based >>>> CHA still being enabled by default)? this way, the quality bar for >>>> the old implementation will be somewhat lower, yet the end-users >>>> will still be able to return to the old implementation if it, for >>>> some reason, works better in their use-cases. >>> >>> Did you mean "experimental" in a generic sense or actually change it >>> from DIAGNOSTIC to EXPERIMENTAL? If the latter then I don't agree >>> this is an experimental flag, it is diagnostic. But either way the >>> testing requirements are the same if we expect to tell end users to >>> try this flag if they hit an problem - the flag has to be known to be >>> functional, so we will have to expand the test coverage. >>> >>> Cheers, >>> David >>> ----- >>> >>>> -- Igor >>>> ------------- >>>> PR:https://git.openjdk.java.net/jdk/pull/3727 >>>> >> From mdoerr at openjdk.java.net Mon May 3 14:04:22 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 3 May 2021 14:04:22 GMT Subject: RFR: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind [v2] In-Reply-To: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> References: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> Message-ID: > PPC64 and s390 have DecodeN implementations which use a MachTemp input. When LCM hoists the DecodeN, the MachTemp nodes reside in the old block, but should get hoisted together with the DecodeN node. > Same is true for load Base input which exists on s390 for example. Unfortunately, that's just a platform specific MachNode which is not nicely recognizable in LCM. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add temp node placement assertion. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3637/files - new: https://git.openjdk.java.net/jdk/pull/3637/files/86a4a6a8..0c893cd4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3637&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3637&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3637.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3637/head:pull/3637 PR: https://git.openjdk.java.net/jdk/pull/3637 From goetz at openjdk.java.net Mon May 3 14:11:52 2021 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Mon, 3 May 2021 14:11:52 GMT Subject: RFR: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind [v2] In-Reply-To: References: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> Message-ID: On Mon, 3 May 2021 14:04:22 GMT, Martin Doerr wrote: >> PPC64 and s390 have DecodeN implementations which use a MachTemp input. When LCM hoists the DecodeN, the MachTemp nodes reside in the old block, but should get hoisted together with the DecodeN node. >> Same is true for load Base input which exists on s390 for example. Unfortunately, that's just a platform specific MachNode which is not nicely recognizable in LCM. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add temp node placement assertion. LGTM ------------- Marked as reviewed by goetz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3637 From mdoerr at openjdk.java.net Mon May 3 14:11:52 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 3 May 2021 14:11:52 GMT Subject: RFR: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind [v2] In-Reply-To: References: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> Message-ID: On Mon, 3 May 2021 14:04:22 GMT, Martin Doerr wrote: >> PPC64 and s390 have DecodeN implementations which use a MachTemp input. When LCM hoists the DecodeN, the MachTemp nodes reside in the old block, but should get hoisted together with the DecodeN node. >> Same is true for load Base input which exists on s390 for example. Unfortunately, that's just a platform specific MachNode which is not nicely recognizable in LCM. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add temp node placement assertion. Hi G?tz, thanks for reviewing! Yes, block must always dominate tempb after the if. Otherwise, the placement would be illegal. It's known that val (DecodeNNode) is movable to block at this place. So its inputs must be live along the same path. I've added a sanity check. Thanks for the reviews! I'll integrate it tomorrow if everything's fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/3637 From yyang at openjdk.java.net Mon May 3 14:26:52 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 3 May 2021 14:26:52 GMT Subject: Integrated: 8265322: C2: Simplify control inputs for BarrierSetC2::obj_allocate In-Reply-To: <_NicB3VpqFwTiu2-bKLNhfnQAxJmbtND66g3Md5ck5g=.f9b4edcd-fa2b-4917-9092-bccc38b202df@github.com> References: <_NicB3VpqFwTiu2-bKLNhfnQAxJmbtND66g3Md5ck5g=.f9b4edcd-fa2b-4917-9092-bccc38b202df@github.com> Message-ID: On Fri, 16 Apr 2021 03:06:19 GMT, Yi Yang wrote: > While just using BarrierSetC2::obj_allocate in https://github.com/openjdk/valhalla/pull/385, I noticed the control input `ctrl` for obj_allocate is not so much necessary. This PR simplifies control inputs for BarrierSetC2::obj_allocate. In most cases, it doesn't change anything since `toobig_false` is equivalent to `ctrl`. In other cases, `toobig_false` is created for Unsafe.allocateInstance while instance size is not statically known, `ctrl` would become control input of IfNode whose projects are `toobig_false` and `toobig_true`, old eden_end and old_eden_top can simply accept `toobig_false` as their control input rather than `ctrl`. This pull request has now been integrated. Changeset: 001c5142 Author: Yi Yang Committer: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/001c5142a6ff4c4073e651ebae9d6d7a8533eb42 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod 8265322: C2: Simplify control inputs for BarrierSetC2::obj_allocate Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/3529 From enikitin at openjdk.java.net Mon May 3 14:35:51 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 3 May 2021 14:35:51 GMT Subject: Integrated: 8265349: vmTestbase/../stress/compiler/deoptimize/Test.java fails with OOME due to CodeCache exhaustion. In-Reply-To: References: Message-ID: <7paABVVNinu3rY7DdKaHPqtbAluanEShjubG59jPbxQ=.4a9ff9b6-bb96-4bff-86d5-3bdcee1c530b@github.com> On Wed, 28 Apr 2021 14:02:16 GMT, Evgeny Nikitin wrote: > The bug: https://bugs.openjdk.java.net/browse/JDK-8265349 > > A repetition of the JDK-8058176 (mlvm tests cause code cache exhaustion), this time with -Xcomp. My measurements show up max code cache consumption of 400-500 kb per test thread and tree (depends on random tree and other factors, of course, but still). For the whole test which usually doesn't exceed 10 threads, I've got max. value of 6.1M for one sequence/tree build (between checks for remaining free space). > > So I suggest to raise the allowances to 10M, just to be safe. Compared to the 103 Mb of required code cache space that test requests via the 'run' parameter, it doesn't look that much. > > Thanks in advance, > // Evgeny Nikitin. This pull request has now been integrated. Changeset: 880c138b Author: Evgeny Nikitin Committer: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/880c138b587e0902cd19c27a02baf41b57ac0bb0 Stats: 79 lines in 2 files changed: 39 ins; 18 del; 22 mod 8265349: vmTestbase/../stress/compiler/deoptimize/Test.java fails with OOME due to CodeCache exhaustion. Reviewed-by: iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/3762 From thartmann at openjdk.java.net Mon May 3 14:58:52 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 3 May 2021 14:58:52 GMT Subject: RFR: 8265915: adjust state_unloading_cycle compuation order in nmethod::is_unloading In-Reply-To: References: Message-ID: On Sun, 25 Apr 2021 11:07:14 GMT, Miao Zheng wrote: > Trivial change of moving state_unloading_cycle computation after state_is_unloading checking. Avoiding useless state_unloading_cycle computation when state_is_unloading is true. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3676 From thartmann at openjdk.java.net Mon May 3 15:04:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 3 May 2021 15:04:07 GMT Subject: RFR: 8266438: Compile::remove_useless_nodes does not remove opaque nodes Message-ID: [JDK-8255026](https://bugs.openjdk.java.net/browse/JDK-8255026) refactored the code in `Compile::remove_useless_nodes` and as a result, useless nodes are no longer removed from the `_predicate_opaqs` list. Before the [change](https://github.com/openjdk/jdk/commit/27230fae#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaL382), the call to `remove_macro_node` took care of this: https://github.com/openjdk/jdk/blob/194bceca3a4d13d4528b86359ee9d5eead3ce7ac/src/hotspot/share/opto/compile.hpp#L676-L684 But the new code only removes nodes from the `_macro_nodes` list. Useless nodes should be removed from the `_skeleton_predicate_opaqs` list as well. I've seen failures due to this with a change in Valhalla (where we call `remove_useless_nodes` more often) but not in mainline. I think this should still be fixed in mainline. Thanks, Tobias ------------- Commit messages: - 8266438: Compile::remove_useless_nodes does not remove opaque nodes Changes: https://git.openjdk.java.net/jdk/pull/3840/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3840&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266438 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3840.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3840/head:pull/3840 PR: https://git.openjdk.java.net/jdk/pull/3840 From thartmann at openjdk.java.net Mon May 3 15:05:56 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 3 May 2021 15:05:56 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: References: Message-ID: On Sun, 2 May 2021 00:50:14 GMT, Hui Shi wrote: >> This patch fix failure exposed by JDK-8264649. >> >> compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 in Compile::check_no_dead_use assertion. >> In LoadNode::eliminate_autobox, early "result" is dead after line 1450 but not added into PhaseGVN worklist for optimization. >> Its out_cnt is 0. If it isn't removed, will trigger assertion in Compile::check_no_dead_use. >> >> >> 1443 } else if (result->is_Add() && result->in(2)->is_Con() && >> 1444 result->in(1)->Opcode() == Op_LShiftX && >> 1445 result->in(1)->in(2) == phase->intcon(shift)) { >> 1446 // We can't do general optimization: ((X<> Z ==> X + (Y>>Z) >> 1447 // but for boxing cache access we know that X<> 1448 // (there is range check) so we do this optimizatrion by hand here. >> 1449 Node* add_con = new RShiftXNode(result->in(2), phase->intcon(shift)); >> --- result before is dead and might not removed >> 1450 result = new AddXNode(result->in(1)->in(1), phase->transform(add_con)); >> 1451 } else >> >> >> Detail analysis is in https://bugs.openjdk.java.net/browse/JDK-8265767 >> >> @mychris I have verified compiler/eliminateAutobox/TestIntBoxing.java on qemu, it failed with same assertion and now passes with this fix. Would you please help verify it on arm32 machine? >> >> Testing: >> - Passed Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. >> - compiler/eliminateAutobox/TestIntBoxing.java on arm32 release/fastdebug/slowdebug > > Hui Shi has updated the pull request incrementally with one additional commit since the last revision: > > Use PhaseIterGVN ptr as LoadNode::eliminate_autobox method parameter for simiplification and add comments for previous commit Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3818 From vlivanov at openjdk.java.net Mon May 3 15:10:50 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 15:10:50 GMT Subject: RFR: 8266438: Compile::remove_useless_nodes does not remove opaque nodes In-Reply-To: References: Message-ID: On Mon, 3 May 2021 14:50:12 GMT, Tobias Hartmann wrote: > [JDK-8255026](https://bugs.openjdk.java.net/browse/JDK-8255026) refactored the code in `Compile::remove_useless_nodes` and as a result, useless nodes are no longer removed from the `_predicate_opaqs` list. Before the [change](https://github.com/openjdk/jdk/commit/27230fae#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaL382), the call to `remove_macro_node` took care of this: > https://github.com/openjdk/jdk/blob/194bceca3a4d13d4528b86359ee9d5eead3ce7ac/src/hotspot/share/opto/compile.hpp#L676-L684 > > But the new code only removes nodes from the `_macro_nodes` list. Useless nodes should be removed from the `_skeleton_predicate_opaqs` list as well. > > I've seen failures due to this with a change in Valhalla (where we call `remove_useless_nodes` more often) but not in mainline. I think this should still be fixed in mainline. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3840 From thartmann at openjdk.java.net Mon May 3 15:10:50 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 3 May 2021 15:10:50 GMT Subject: RFR: 8266438: Compile::remove_useless_nodes does not remove opaque nodes In-Reply-To: References: Message-ID: On Mon, 3 May 2021 14:50:12 GMT, Tobias Hartmann wrote: > [JDK-8255026](https://bugs.openjdk.java.net/browse/JDK-8255026) refactored the code in `Compile::remove_useless_nodes` and as a result, useless nodes are no longer removed from the `_predicate_opaqs` list. Before the [change](https://github.com/openjdk/jdk/commit/27230fae#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaL382), the call to `remove_macro_node` took care of this: > https://github.com/openjdk/jdk/blob/194bceca3a4d13d4528b86359ee9d5eead3ce7ac/src/hotspot/share/opto/compile.hpp#L676-L684 > > But the new code only removes nodes from the `_macro_nodes` list. Useless nodes should be removed from the `_skeleton_predicate_opaqs` list as well. > > I've seen failures due to this with a change in Valhalla (where we call `remove_useless_nodes` more often) but not in mainline. I think this should still be fixed in mainline. > > Thanks, > Tobias Thanks for the review, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/3840 From chagedorn at openjdk.java.net Mon May 3 15:22:51 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 15:22:51 GMT Subject: RFR: 8266438: Compile::remove_useless_nodes does not remove opaque nodes In-Reply-To: References: Message-ID: <1HXFe1Sbx9DxvO3VEhgmQaZT8FbpZ6Tp5CaT9L6tEdE=.43d0af5c-d905-41f1-9f7f-53ba94b8e77f@github.com> On Mon, 3 May 2021 14:50:12 GMT, Tobias Hartmann wrote: > [JDK-8255026](https://bugs.openjdk.java.net/browse/JDK-8255026) refactored the code in `Compile::remove_useless_nodes` and as a result, useless nodes are no longer removed from the `_predicate_opaqs` list. Before the [change](https://github.com/openjdk/jdk/commit/27230fae#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaL382), the call to `remove_macro_node` took care of this: > https://github.com/openjdk/jdk/blob/194bceca3a4d13d4528b86359ee9d5eead3ce7ac/src/hotspot/share/opto/compile.hpp#L676-L684 > > But the new code only removes nodes from the `_macro_nodes` list. Useless nodes should be removed from the `_skeleton_predicate_opaqs` list as well. > > I've seen failures due to this with a change in Valhalla (where we call `remove_useless_nodes` more often) but not in mainline. I think this should still be fixed in mainline. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3840 From jbhateja at openjdk.java.net Mon May 3 15:30:19 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 3 May 2021 15:30:19 GMT Subject: RFR: 8265126: unified handling for VectorMask object re-materialization during de-optimization (re-submit) [v3] In-Reply-To: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: > Following flow describes object reconstruction for de-optimization:- > > 1. PhaseVector::scalarize_vbox_node() creates SafePointScalarObjectNode to captures the box type information, also it connects to node holding the boxed value. > 2. During code emit phase (PhaseOutput) C2 process above information to dumps ObjectValue holding the box information and LocationValue to holding the value information into ScopeDescriptor corresponding to Safepoint PC. > 3. De-optimization blobs dump the value held in registers to the stack locations using RegisterSave::save_live_registers() and a mapping b/w register and its stack location is added to RegisterMap. > 4. During de-optimization, compiled frame objects are re-allocated using identity information held in ObjectValue and their fields are initialized using values held in the stack locations accessed through register-stack mappings. > > By inserting a VectorStoreMaskNode before stitching the mask holding node to Safepoint we make sure that value held in opmask/vector register is transferred to a byte vector. Thus rest of the flow works as it is, stack location will hold the value in the form of a byte array irrespective of the box shape. > > tier1-tier3 regressions are clean with UseAVX=2/3. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8265126: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3721/files - new: https://git.openjdk.java.net/jdk/pull/3721/files/70c03aa3..a72641ed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3721&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3721&range=01-02 Stats: 8 lines in 1 file changed: 0 ins; 1 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/3721.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3721/head:pull/3721 PR: https://git.openjdk.java.net/jdk/pull/3721 From jbhateja at openjdk.java.net Mon May 3 15:30:21 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 3 May 2021 15:30:21 GMT Subject: RFR: 8265126: unified handling for VectorMask object re-materialization during de-optimization (re-submit) [v2] In-Reply-To: References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: On Mon, 3 May 2021 09:11:08 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge http://github.com/openjdk/jdk into JDK-8265126 >> - 8265126:[REDO] unified handling for VectorMask object re-materialization during de-optimization > > Test results (hs-tier1 - hs-tier5) are clean. Hi @iwanowww, your comments have been resolved. ------------- PR: https://git.openjdk.java.net/jdk/pull/3721 From iignatyev at openjdk.java.net Mon May 3 16:04:04 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 3 May 2021 16:04:04 GMT Subject: RFR: 8266449: cleanup jtreg tags in compiler/intrinsics/sha/cli tests Message-ID: Hi all, could you please review this small cleanup of jtreg tags in compiler/intrinsics/sha/cli tests? - `@modules` tags are not needed since we have changed the used test libraries not to depend on `java.management` module and `jdk.internal.misc.*` classes; - `testcases` isn't required in `@library` b/c the files this directory contains are accessible from `/` `@library`. Thanks, -- Igor ------------- Commit messages: - 8266449 Changes: https://git.openjdk.java.net/jdk/pull/3841/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3841&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266449 Stats: 36 lines in 12 files changed: 0 ins; 23 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/3841.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3841/head:pull/3841 PR: https://git.openjdk.java.net/jdk/pull/3841 From vlivanov at openjdk.java.net Mon May 3 16:21:50 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 16:21:50 GMT Subject: RFR: 8265126: unified handling for VectorMask object re-materialization during de-optimization (re-submit) [v3] In-Reply-To: References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: On Mon, 3 May 2021 15:30:19 GMT, Jatin Bhateja wrote: >> Following flow describes object reconstruction for de-optimization:- >> >> 1. PhaseVector::scalarize_vbox_node() creates SafePointScalarObjectNode to captures the box type information, also it connects to node holding the boxed value. >> 2. During code emit phase (PhaseOutput) C2 process above information to dumps ObjectValue holding the box information and LocationValue to holding the value information into ScopeDescriptor corresponding to Safepoint PC. >> 3. De-optimization blobs dump the value held in registers to the stack locations using RegisterSave::save_live_registers() and a mapping b/w register and its stack location is added to RegisterMap. >> 4. During de-optimization, compiled frame objects are re-allocated using identity information held in ObjectValue and their fields are initialized using values held in the stack locations accessed through register-stack mappings. >> >> By inserting a VectorStoreMaskNode before stitching the mask holding node to Safepoint we make sure that value held in opmask/vector register is transferred to a byte vector. Thus rest of the flow works as it is, stack location will hold the value in the form of a byte array irrespective of the box shape. >> >> tier1-tier3 regressions are clean with UseAVX=2/3. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8265126: Review comments resolution. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3721 From github.com+168222+mgkwill at openjdk.java.net Mon May 3 16:31:00 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Mon, 3 May 2021 16:31:00 GMT Subject: Integrated: 8265491: Math Signum optimization for x86 In-Reply-To: References: Message-ID: On Mon, 19 Apr 2021 23:00:47 GMT, Marcus G K Williams wrote: > x86 Math.Signum() uses two floating point compares and a copy sign operation involving data movement to gpr and XMM. > > We can optimize to one floating point compare and sign computation in XMM. We observe ~25% performance improvement with this optimization. > > Base: > > Benchmark Mode Cnt Score Error Units > Signum._1_signumFloatTest avgt 5 4.660 ? 0.040 ns/op > Signum._2_overheadFloat avgt 5 3.314 ? 0.023 ns/op > Signum._3_signumDoubleTest avgt 5 4.809 ? 0.043 ns/op > Signum._4_overheadDouble avgt 5 3.313 ? 0.015 ns/op > > > Optimized: > signum intrinsic patch > > Benchmark Mode Cnt Score Error Units > Signum._1_signumFloatTest avgt 5 3.769 ? 0.015 ns/op > Signum._2_overheadFloat avgt 5 3.312 ? 0.025 ns/op > Signum._3_signumDoubleTest avgt 5 3.765 ? 0.005 ns/op > Signum._4_overheadDouble avgt 5 3.309 ? 0.010 ns/op > > > Signed-off-by: Marcus G K Williams This pull request has now been integrated. Changeset: ff65920c Author: Marcus G K Williams Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/ff65920cd17e7e862b182524e2151784e26a079c Stats: 218 lines in 7 files changed: 212 ins; 1 del; 5 mod 8265491: Math Signum optimization for x86 Reviewed-by: jiefu, jbhateja, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/3581 From sviswanathan at openjdk.java.net Mon May 3 16:55:57 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 3 May 2021 16:55:57 GMT Subject: RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations [v3] In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 01:58:27 GMT, Sandhya Viswanathan wrote: >> All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin). >> >> Changes include: >> * Rewrite Vector API slice/unslice using already intrinsic methods >> * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails >> * Vector API conversion tests thresholds adjustment >> >> Base Performance: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ? 254.580 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ? 326.770 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ? 22.060 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ? 220.340 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ? 231.703 ops/ms >> >> Performance with patch: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ? 37.296 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ? 115.886 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ? 8.203 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ? 43.637 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ? 13.376 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Review comments: blendmask etc @iwanowww Could you please review if the change in library_call.cpp looks ok to you? ------------- PR: https://git.openjdk.java.net/jdk/pull/3804 From psandoz at openjdk.java.net Mon May 3 16:57:55 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 3 May 2021 16:57:55 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v3] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 06:51:29 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(VectorOperators.LSHR, n) >> res = lanewise(VectorOperations.OR, vec1 , vec2) >> >> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. >> >> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. >> >> Please find below the performance data for included JMH benchmark. >> Machine: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade lake Server) >> >> `` >> >> Benchmark | (TESTSIZE) | Shift | Baseline (ops/ms) | Withopt (ops/ms) | Gain % >> -- | -- | -- | -- | -- | -- >> RotateBenchmark.testRotateLeftI | (size) | (shift) | 7384.747 | 7706.652 | 4.36 >> RotateBenchmark.testRotateLeftI | 64 | 11 | 3723.305 | 3816.968 | 2.52 >> RotateBenchmark.testRotateLeftI | 128 | 11 | 1811.521 | 1966.05 | 8.53 >> RotateBenchmark.testRotateLeftI | 256 | 11 | 7133.296 | 7715.047 | 8.16 >> RotateBenchmark.testRotateLeftI | 64 | 21 | 3612.144 | 3886.225 | 7.59 >> RotateBenchmark.testRotateLeftI | 128 | 21 | 1815.422 | 1962.753 | 8.12 >> RotateBenchmark.testRotateLeftI | 256 | 21 | 7216.353 | 7677.165 | 6.39 >> RotateBenchmark.testRotateLeftI | 64 | 31 | 3602.008 | 3892.297 | 8.06 >> RotateBenchmark.testRotateLeftI | 128 | 31 | 1882.163 | 1958.887 | 4.08 >> RotateBenchmark.testRotateLeftI | 256 | 31 | 11819.443 | 11912.864 | 0.79 >> RotateBenchmark.testRotateLeftI | 64 | 11 | 5978.475 | 6060.189 | 1.37 >> RotateBenchmark.testRotateLeftI | 128 | 11 | 2965.179 | 3060.969 | 3.23 >> RotateBenchmark.testRotateLeftI | 256 | 11 | 11479.579 | 11684.148 | 1.78 >> RotateBenchmark.testRotateLeftI | 64 | 21 | 5904.903 | 6094.409 | 3.21 >> RotateBenchmark.testRotateLeftI | 128 | 21 | 2969.879 | 3074.1 | 3.51 >> RotateBenchmark.testRotateLeftI | 256 | 21 | 11531.654 | 12155.954 | 5.41 >> RotateBenchmark.testRotateLeftI | 64 | 31 | 5730.918 | 6112.514 | 6.66 >> RotateBenchmark.testRotateLeftI | 128 | 31 | 2937.19 | 2976.297 | 1.33 >> RotateBenchmark.testRotateLeftI | 256 | 31 | 16159.184 | 16459.462 | 1.86 >> RotateBenchmark.testRotateLeftI | 64 | 11 | 8154.982 | 8396.089 | 2.96 >> RotateBenchmark.testRotateLeftI | 128 | 11 | 4142.224 | 4292.049 | 3.62 >> RotateBenchmark.testRotateLeftI | 256 | 11 | 15958.154 | 16163.518 | 1.29 >> RotateBenchmark.testRotateLeftI | 64 | 21 | 8098.805 | 8504.279 | 5.01 >> RotateBenchmark.testRotateLeftI | 128 | 21 | 4137.598 | 4314.868 | 4.28 >> RotateBenchmark.testRotateLeftI | 256 | 21 | 16201.666 | 15992.958 | -1.29 >> RotateBenchmark.testRotateLeftI | 64 | 31 | 8027.169 | 8484.379 | 5.70 >> RotateBenchmark.testRotateLeftI | 128 | 31 | 4146.29 | 4039.681 | -2.57 >> RotateBenchmark.testRotateLeftL | 256 | 31 | 3566.176 | 3805.248 | 6.70 >> RotateBenchmark.testRotateLeftL | 64 | 11 | 1820.219 | 1962.866 | 7.84 >> RotateBenchmark.testRotateLeftL | 128 | 11 | 917.085 | 1007.334 | 9.84 >> RotateBenchmark.testRotateLeftL | 256 | 11 | 3592.139 | 3973.698 | 10.62 >> RotateBenchmark.testRotateLeftL | 64 | 21 | 1827.63 | 1999.711 | 9.42 >> RotateBenchmark.testRotateLeftL | 128 | 21 | 907.104 | 1002.997 | 10.57 >> RotateBenchmark.testRotateLeftL | 256 | 21 | 3780.962 | 3873.489 | 2.45 >> RotateBenchmark.testRotateLeftL | 64 | 31 | 1830.121 | 1955.63 | 6.86 >> RotateBenchmark.testRotateLeftL | 128 | 31 | 891.411 | 982.138 | 10.18 >> RotateBenchmark.testRotateLeftL | 256 | 31 | 5890.544 | 6100.594 | 3.57 >> RotateBenchmark.testRotateLeftL | 64 | 11 | 2984.329 | 3021.971 | 1.26 >> RotateBenchmark.testRotateLeftL | 128 | 11 | 1485.109 | 1527.689 | 2.87 >> RotateBenchmark.testRotateLeftL | 256 | 11 | 5903.411 | 6083.775 | 3.06 >> RotateBenchmark.testRotateLeftL | 64 | 21 | 2925.37 | 3050.958 | 4.29 >> RotateBenchmark.testRotateLeftL | 128 | 21 | 1486.432 | 1537.155 | 3.41 >> RotateBenchmark.testRotateLeftL | 256 | 21 | 5853.721 | 6000.682 | 2.51 >> RotateBenchmark.testRotateLeftL | 64 | 31 | 2896.116 | 3072.783 | 6.10 >> RotateBenchmark.testRotateLeftL | 128 | 31 | 1483.132 | 1546.588 | 4.28 >> RotateBenchmark.testRotateLeftL | 256 | 31 | 8059.206 | 8218.047 | 1.97 >> RotateBenchmark.testRotateLeftL | 64 | 11 | 4022.416 | 4195.52 | 4.30 >> RotateBenchmark.testRotateLeftL | 128 | 11 | 2084.296 | 2068.238 | -0.77 >> RotateBenchmark.testRotateLeftL | 256 | 11 | 7971.832 | 8172.819 | 2.52 >> RotateBenchmark.testRotateLeftL | 64 | 21 | 4032.036 | 4344.469 | 7.75 >> RotateBenchmark.testRotateLeftL | 128 | 21 | 2068.957 | 2138.685 | 3.37 >> RotateBenchmark.testRotateLeftL | 256 | 21 | 8140.63 | 8003.283 | -1.69 >> RotateBenchmark.testRotateLeftL | 64 | 31 | 4088.621 | 4296.091 | 5.07 >> RotateBenchmark.testRotateLeftL | 128 | 31 | 2007.753 | 2088.455 | 4.02 >> RotateBenchmark.testRotateRightI | 256 | 31 | 7358.793 | 7548.976 | 2.58 >> RotateBenchmark.testRotateRightI | 64 | 11 | 3648.868 | 3897.47 | 6.81 >> RotateBenchmark.testRotateRightI | 128 | 11 | 1862.73 | 1969.964 | 5.76 >> RotateBenchmark.testRotateRightI | 256 | 11 | 7268.806 | 7790.588 | 7.18 >> RotateBenchmark.testRotateRightI | 64 | 21 | 3577.79 | 3979.675 | 11.23 >> RotateBenchmark.testRotateRightI | 128 | 21 | 1773.243 | 1921.088 | 8.34 >> RotateBenchmark.testRotateRightI | 256 | 21 | 7084.974 | 7609.912 | 7.41 >> RotateBenchmark.testRotateRightI | 64 | 31 | 3688.781 | 3909.65 | 5.99 >> RotateBenchmark.testRotateRightI | 128 | 31 | 1845.978 | 1928.316 | 4.46 >> RotateBenchmark.testRotateRightI | 256 | 31 | 11463.228 | 12179.833 | 6.25 >> RotateBenchmark.testRotateRightI | 64 | 11 | 5678.052 | 6028.573 | 6.17 >> RotateBenchmark.testRotateRightI | 128 | 11 | 2990.419 | 3070.409 | 2.67 >> RotateBenchmark.testRotateRightI | 256 | 11 | 11780.283 | 12105.261 | 2.76 >> RotateBenchmark.testRotateRightI | 64 | 21 | 5827.8 | 6020.208 | 3.30 >> RotateBenchmark.testRotateRightI | 128 | 21 | 2904.852 | 3047.154 | 4.90 >> RotateBenchmark.testRotateRightI | 256 | 21 | 11359.146 | 12060.401 | 6.17 >> RotateBenchmark.testRotateRightI | 64 | 31 | 5823.207 | 6079.82 | 4.41 >> RotateBenchmark.testRotateRightI | 128 | 31 | 2984.484 | 3045.719 | 2.05 >> RotateBenchmark.testRotateRightI | 256 | 31 | 16200.504 | 16376.475 | 1.09 >> RotateBenchmark.testRotateRightI | 64 | 11 | 8118.399 | 8315.407 | 2.43 >> RotateBenchmark.testRotateRightI | 128 | 11 | 4130.745 | 4092.588 | -0.92 >> RotateBenchmark.testRotateRightI | 256 | 11 | 15842.168 | 16469.119 | 3.96 >> RotateBenchmark.testRotateRightI | 64 | 21 | 7855.164 | 8188.913 | 4.25 >> RotateBenchmark.testRotateRightI | 128 | 21 | 4114.378 | 4035.56 | -1.92 >> RotateBenchmark.testRotateRightI | 256 | 21 | 15636.117 | 16289.632 | 4.18 >> RotateBenchmark.testRotateRightI | 64 | 31 | 8108.067 | 7996.517 | -1.38 >> RotateBenchmark.testRotateRightI | 128 | 31 | 3997.547 | 4153.58 | 3.90 >> RotateBenchmark.testRotateRightL | 256 | 31 | 3685.99 | 3814.384 | 3.48 >> RotateBenchmark.testRotateRightL | 64 | 11 | 1787.875 | 1916.541 | 7.20 >> RotateBenchmark.testRotateRightL | 128 | 11 | 940.141 | 990.383 | 5.34 >> RotateBenchmark.testRotateRightL | 256 | 11 | 3745.968 | 3920.667 | 4.66 >> RotateBenchmark.testRotateRightL | 64 | 21 | 1877.94 | 1998.072 | 6.40 >> RotateBenchmark.testRotateRightL | 128 | 21 | 933.536 | 1004.61 | 7.61 >> RotateBenchmark.testRotateRightL | 256 | 21 | 3744.763 | 3947.427 | 5.41 >> RotateBenchmark.testRotateRightL | 64 | 31 | 1864.818 | 1978.277 | 6.08 >> RotateBenchmark.testRotateRightL | 128 | 31 | 906.965 | 998.692 | 10.11 >> RotateBenchmark.testRotateRightL | 256 | 31 | 5910.469 | 6062.429 | 2.57 >> RotateBenchmark.testRotateRightL | 64 | 11 | 2914.64 | 3033.127 | 4.07 >> RotateBenchmark.testRotateRightL | 128 | 11 | 1491.344 | 1543.936 | 3.53 >> RotateBenchmark.testRotateRightL | 256 | 11 | 5801.818 | 6098.892 | 5.12 >> RotateBenchmark.testRotateRightL | 64 | 21 | 2881.328 | 3089.547 | 7.23 >> RotateBenchmark.testRotateRightL | 128 | 21 | 1485.969 | 1526.231 | 2.71 >> RotateBenchmark.testRotateRightL | 256 | 21 | 5783.495 | 5957.649 | 3.01 >> RotateBenchmark.testRotateRightL | 64 | 31 | 3008.182 | 3026.323 | 0.60 >> RotateBenchmark.testRotateRightL | 128 | 31 | 1464.566 | 1546.825 | 5.62 >> RotateBenchmark.testRotateRightL | 256 | 31 | 8208.124 | 8361.437 | 1.87 >> RotateBenchmark.testRotateRightL | 64 | 11 | 4062.465 | 4319.412 | 6.32 >> RotateBenchmark.testRotateRightL | 128 | 11 | 2029.995 | 2086.497 | 2.78 >> RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11 >> RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47 >> RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10 >> RotateBenchmark.testRotateRightL | 256 | 21 | 8155.015 | 8175.792 | 0.25 >> RotateBenchmark.testRotateRightL | 64 | 31 | 3960.629 | 4263.922 | 7.66 >> RotateBenchmark.testRotateRightL | 128 | 31 | 1996.862 | 2055.486 | 2.94 >> >> `` > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266054: Review comments resolution. Testing-wise, can we reuse the `Kernel-Binary-*-op.template` files? hence no need for separate templates Further, i think we need to test with the vector accepting lane-wise method and the broadcast accepting method, since they go through different code paths. The broadcast method can use primitive type rather than cast to `int`, likely making it easier to reuse the binary templates. It would be good if the scalar methods for rotating left/right were identical for the main code and tests. I prefer the code in the test methods. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 524: > 522: public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > 523: /** Produce {@code rotateLeft(a,n)}. Integral only. */ > 524: public static final /*bitwise*/ Binary ROL = binary("ROL", "rotateLeft", VectorSupport.VECTOR_OP_LROTATE, VO_SHIFT | VO_SPECIAL); I think we can remove the `VO_SPECIAL` flag on `ROL` and `ROR` now it is uniformly managed? ------------- PR: https://git.openjdk.java.net/jdk/pull/3720 From chagedorn at openjdk.java.net Mon May 3 17:42:53 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 17:42:53 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: <7n-HUaVnogF7L6lfbamFlUcvFb1jEB7zGwSf8dB5dsw=.131617ed-5a18-4615-b1d1-bbc3eb6c3cc0@github.com> On Fri, 30 Apr 2021 19:02:37 GMT, Igor Ignatyev wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix XCOMP cases from old framework and turn it into new debug flag -DIgnoreCompilerControls >> - Apply review comments: Added new Compiler annotation class for @DontCompile, changed C1 into C1_SIMPLE, refactored code for ExcludeRandom and FlipC1C2, added missing flag description in README, and some other smaller refactoring/renamings > > test/lib/jdk/test/lib/hotspot/ir_framework/TestFramework.java line 984: > >> 982: private final String hotspotPidFileName; >> 983: >> 984: JVMOutput(OutputAnalyzer oa, Scenario scenario, ProcessBuilder process) { > > instead of passing a ProcessBuilder, you can either pass a command-line or better just past prepared flags, and make it JVMOutput's (you will need a better name) responsibility for executing the process, creating OutputAnalyzer, etc. > > Suggestion: > > JVMOutput(List cmds, Scenario scenario) { > var pb = ProcessTools.createJavaProcessBuilder(cmds); > try { > // Calls 'main' of TestFrameworkExecution to run all specified tests with commands 'cmds'. > // Use executeProcess instead of executeTestJvm as we have already added the JTreg VM and > // Java options in prepareTestVMFlags(). > this.oa = ProcessTools.executeProcess(process); > } catch (Exception e) { > throw new TestFrameworkException("Error while executing Test VM", e); > } > this.cmd = pb.command(); > ... > > > and then runTest will be smth like > > private void runTestVM(List additionalFlags) { > List cmds = prepareTestVMFlags(additionalFlags); > socket.start(); > JVMOutput output = new JVMOutput(cmds, scenario); > ... I did a refactoring and introduced new classes `FlagVMProcess` and `TestVMProcess`. I also did a renaming of `TestFrameworkPrepareFlags` -> `FlagVM` and `TestFrameworkExecution` -> `TestVM` to make it more clear whats going on. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Mon May 3 17:42:51 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 17:42:51 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v8] In-Reply-To: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: > This RFE provides an IR test framework to perform regex-based checks on the C2 IR shape of test methods emitted by the VM flags `-XX:+PrintIdeal` and `-XX:+PrintOptoAssembly`. The framework can also be used for other non-IR matching (and non-compiler) tests by providing easy to use annotations for commonly used testing patterns and compiler control flags. > > The framework is based on the ideas of the currently present IR test framework in [Valhalla](https://github.com/openjdk/valhalla/blob/e9c78ce4fcfd01361c35883e0d68f9ae5a80d079/test/hotspot/jtreg/compiler/valhalla/inlinetypes/InlineTypeTest.java) (mainly implemented by @TobiHartmann) which is being used with great success. This new framework aims to replace the old one in Valhalla at some point. > > A detailed description about how this new IR test framework works and how it is used is provided in the [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) file and in the [Javadocs](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc/jdk/test/lib/hotspot/ir_framework/package-summary.html) written for the framework classes. > > To finish a first version of this framework for JDK 17, I decided to leave some improvement possibilities and ideas to be followed up on in additional RFEs. Some ideas are mentioned in "Future Work" in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) and were also created as subtasks of this RFE. > > Testing (also described in "Internal Framework Tests in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md)): > There are various tests to verify the correctness of the test framework which can be found as JTreg tests in the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) folder. Additional testing was performed by converting all compiler Inline Types test of project Valhalla (done by @katyapav in [JDK-8263024](https://bugs.openjdk.java.net/browse/JDK-8263024)) that used the old framework to the new framework. This provided additional testing for the framework itself. We ran the converted tests with all the flag settings used in hs-tier1-9 and hs-precheckin-comp. For sanity checking, this was also done with a sample IR test in mainline. > > Some stats about the framework code added to [ir_framework](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework): > > - without the [Javadocs files](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc) : 60 changed files, 13212 insertions, 0 deletions. > - without the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) and [examples](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/examples) folder: 40 files changed, 6781 insertions > - comments: 2399 insertions (calculated with `git diff --cached !(tests|examples) | grep -c -E "(^[+-]\s*(/)?*)|(^[+-]\s*//)"`) > - which leaves 4382 lines of code inserted > > Big thanks to: > - @TobiHartmann for all his help by discussing the new framework and for providing insights from his IR test framework in Valhalla. > - @katyapav for converting the Valhalla tests to use the new framework which found some harder to catch bugs in the framework and also some actual C2 bugs. > - @iignatev for helping to simplify the framework usage with JTreg and with the framework internal VM calling structure. > - and others who provided valuable feedback. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with four additional commits since the last revision: - Remove TestFramework: both runWithScenarios, both runWithHelperClasses, and one runWithFlags method to make interface cleaner/simpler, update internal tests accordingly - Minor improvements, comment fixes, and test fixes - Rename TestFrameworkPrepareFlags -> FlagVM and rename TestFrameworkExecution -> TestVM - Apply review comments: Extract Test classes into own files, extract Flag and Test VM processes into own class, replace socket-based flag VM communication with file-based and clean up socket usage of test VM, fix useless processing of hotspot-pid files if no IR rules are applied ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3508/files - new: https://git.openjdk.java.net/jdk/pull/3508/files/90a0064d..d6c72ec6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3508&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3508&range=06-07 Stats: 4113 lines in 31 files changed: 2056 ins; 1938 del; 119 mod Patch: https://git.openjdk.java.net/jdk/pull/3508.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3508/head:pull/3508 PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Mon May 3 17:42:53 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 17:42:53 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: <7n-HUaVnogF7L6lfbamFlUcvFb1jEB7zGwSf8dB5dsw=.131617ed-5a18-4615-b1d1-bbc3eb6c3cc0@github.com> References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> <7n-HUaVnogF7L6lfbamFlUcvFb1jEB7zGwSf8dB5dsw=.131617ed-5a18-4615-b1d1-bbc3eb6c3cc0@github.com> Message-ID: On Mon, 3 May 2021 17:34:30 GMT, Christian Hagedorn wrote: >> test/lib/jdk/test/lib/hotspot/ir_framework/TestFramework.java line 984: >> >>> 982: private final String hotspotPidFileName; >>> 983: >>> 984: JVMOutput(OutputAnalyzer oa, Scenario scenario, ProcessBuilder process) { >> >> instead of passing a ProcessBuilder, you can either pass a command-line or better just past prepared flags, and make it JVMOutput's (you will need a better name) responsibility for executing the process, creating OutputAnalyzer, etc. >> >> Suggestion: >> >> JVMOutput(List cmds, Scenario scenario) { >> var pb = ProcessTools.createJavaProcessBuilder(cmds); >> try { >> // Calls 'main' of TestFrameworkExecution to run all specified tests with commands 'cmds'. >> // Use executeProcess instead of executeTestJvm as we have already added the JTreg VM and >> // Java options in prepareTestVMFlags(). >> this.oa = ProcessTools.executeProcess(process); >> } catch (Exception e) { >> throw new TestFrameworkException("Error while executing Test VM", e); >> } >> this.cmd = pb.command(); >> ... >> >> >> and then runTest will be smth like >> >> private void runTestVM(List additionalFlags) { >> List cmds = prepareTestVMFlags(additionalFlags); >> socket.start(); >> JVMOutput output = new JVMOutput(cmds, scenario); >> ... > > I did a refactoring and introduced new classes `FlagVMProcess` and `TestVMProcess`. I also did a renaming of `TestFrameworkPrepareFlags` -> `FlagVM` and `TestFrameworkExecution` -> `TestVM` to make it more clear whats going on. I did a refactoring and introduced new classes `FlagVMProcess` and `TestVMProcess`. I also did a renaming of `TestFrameworkPrepareFlags` -> `FlagVM` and `TestFrameworkExecution` -> `TestVM` to make it more clear whats going on. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Mon May 3 17:42:53 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 17:42:53 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Sun, 2 May 2021 20:59:46 GMT, Christian Hagedorn wrote: >> I don't have a strong opinion here. let's start w/ what you have now, we can always change it later if we find that its complexity doesn't bring much value. > > Sounds good. I'll push an update with the changes tomorrow. Thanks for your review! I moved to a file-based approach for the flag VM and cleaned the socket up for the sole usage for the test VM. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Mon May 3 17:42:54 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 17:42:54 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Fri, 30 Apr 2021 18:22:46 GMT, Igor Ignatyev wrote: >> test/lib/jdk/test/lib/hotspot/ir_framework/TestInfo.java line 57: >> >>> 55: * allowing a compilation on the requested level in {@link Test#compLevel()}. >>> 56: * >>> 57: * @return {@code true} if the framework compiled the test; >> >> as in `RunInfo`: >> s/compiled/skipped compilation of/ > > btw, do we really need these methods in both `RunInfo` and `TestInfo`? Yes, they are implemented slightly differently. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Mon May 3 17:53:56 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 17:53:56 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v2] In-Reply-To: <8m3dztYNNjO9ZHKSoyH3-YOtwFbKpMyn8WPgnxZRV-Q=.a07c3878-4567-4c43-b732-515fa750a6b2@github.com> References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> <8m3dztYNNjO9ZHKSoyH3-YOtwFbKpMyn8WPgnxZRV-Q=.a07c3878-4567-4c43-b732-515fa750a6b2@github.com> Message-ID: On Mon, 26 Apr 2021 19:16:17 GMT, Igor Ignatyev wrote: >>> > * although having javadoc for testlibraries is highly desirable, I don't think we should check in the generated HTML files >>> > * the same goes for `README.html` generated from `README.md` >>> >>> Okay, I will remove them. Does it make sense to still have the HTML files somewhere in the web, for example, on my cr.openjdk? >>> >>> > * this library is hotspot-centric, I highly doubt that it will be used by any tests outside of the hotspot test base, hence the more appropriate location for it would be inside `test/hotspot/jtreg/testlibrary`. >>> > * I assume `test/lib/jdk/test/lib/hotspot/ir_framework/tests/` are the tests for the framework, in that case they should be in `test/lib-test`, if we stick to `test/lib` as the location for the library, or in `test/hotspot/jtreg/testlibrary_tests`, if we move it to `test/hotspot` >>> >>> That makes sense to move everything to `test/hotspot/jtreg/testlibrary`. Right, the `test/lib/jdk/test/lib/hotspot/ir_framework/tests/` are only tests for the framework itself and should not be run as part of tier testing each time (does not make much sense) but only when the framework is actually modified. Is this still the case when putting them in `test/hotspot/jtreg/testlibrary_tests` (i.e. not executed unless run manually)? >> >> `test/hotspot/jtreg/testlibrary_tests` are seldomly run as part of `:hotspot_misc` test group, yet I don't think it's an issue. the benefit of running testlibrary tests is to gain confidence that the tests which use these libraries are reliable. I'd also disagree that `ir_framework/tests` should be run *only* when the framework is actually, they should also be run when its dependencies are changed, and the framework highly depends on hotspot, so to be sure that the framework didn't get broken after changes in c2, whitebox, etc, you do need run these tests more often. >> >> Thanks, >> -- Igor > >> There is no decision, yet, (and open for discussion) about the location and package name for the framework and the framework internal tests. After discussing it offline with @iignatev, we think there are basically two good options: >> >> 1. Leave the framework in `/test/lib` and put the framework internal tests into `/test/lib-test`: >> >> * Pros: Only need to specify `@library /test/lib` in a JTreg test; the framework internal tests are run in tier1 before any other tests are run which depend on the framework ensuring correctness. >> * Cons: Clarity: The framework is intended to be used for HotSpot tests only (thus not exactly the right location in `/test/lib`); the framework internal tests might be run too often as part of tier1 (trade-off ensuring correctness vs. execution time). >> 2. Move the framework to `/test/hotspot/jtreg/testlibrary`, put the framework internal tests into `/test/hotspot/jtreg/testlibrary_tests`, and use package name `hotspot.test.lib.ir_framework`: >> >> * Pros: Clarity: The framework is only used for HotSpot tests (mainly compiler tests but could also be used for other tests). >> * Cons: A JTreg test needs to additionally specify `/testlibrary/ir_framework` like `@library /testlibrary/ir_framework /test/lib`; the framework internal tests are run more seldomly as part of `:hotspot_misc` (trade-off, see cons of first option). >> > > there is also 3rd alternative, move the framework to `/test/hotspot/jtreg/compiler/` and use `compiler.ir_framework` or `compiler.lib.ir_framework` as the package name. that will make it even clearer that this is a compiler-specific library, and its users are going to be in `/test/hotspot/jtreg/compiler/`. I understand that there can be some runtime (or other) tests that might benefit from this library, but in the end, they won't be clear runtime tests, so `t/h/j/compiler` might be a better location for them anyways. the `@library` tag will still have to contain two paths, but these will be usual paths as in many compiler tests : `@library / /test/lib`. > the tests can be placed in `/test/hotspot/jtreg/testlibrary_tests` as in the 2nd option, with the same trade-off. > > -- Igor I did a slightly bigger refactoring reflecting the review comments by @iignatev: - Extracting classes into own Java files - Class renamings - Moving to file-based communication for the flag VM to simplify the socket usage for the test VM (required some clean-ups) - Removed some `TestFramework` public methods as discussed earlier in this PR. Missing (I will tackle this tomorrow): - separating classes into own (sub)packages - trying to move tests to `/test/hotspot/jtreg/testlibrary_tests` and the framework to `/test/hotspot/jtreg/compiler` as suggested in an earlier comment ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Mon May 3 18:10:57 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 3 May 2021 18:10:57 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v7] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Fri, 30 Apr 2021 19:12:08 GMT, Igor Ignatyev wrote: > I've reviewed all the files but `examples` and `tests`, for them I'd actually recommend you to put `examples` into `tests` (possible in a subdir), and just update readme to say that the examples can be found in test directory. Thanks a lot Igor! I'll consider to move the `examples` as well tomorrow. > test/lib/jdk/test/lib/hotspot/ir_framework/TestVMException.java line 34: > >> 32: TestVMException(String exceptionInfo) { >> 33: super("There were one or multiple errors. Please check stderr for more information."); >> 34: this.exceptionInfo = exceptionInfo; > > why can't this info be stored as `Throwable::message`? There were errors in some cases when the message was too long. Thus, I used a separate field for it which I then print to the stderr. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 3 18:34:53 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 3 May 2021 18:34:53 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v2] In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 20:02:04 GMT, Sandhya Viswanathan wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor adler32 algorithm to a new file x86/macroAssembler_x86_adler.cpp; added a scratch reg to vpmulld, and some other issues > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3248: > >> 3246: >> 3247: void MacroAssembler::vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg) { >> 3248: // Used in sign-bit flipping with aligned address. > > You could remove the spurious comment here. removed ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 3 18:41:51 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 3 May 2021 18:41:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v2] In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 20:11:44 GMT, Sandhya Viswanathan wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor adler32 algorithm to a new file x86/macroAssembler_x86_adler.cpp; added a scratch reg to vpmulld, and some other issues > > src/hotspot/cpu/x86/macroAssembler_x86_adler.cpp line 32: > >> 30: #include "macroAssembler_x86.hpp" >> 31: >> 32: > > The updateBytesAdler32 should be under #ifdef _LP64, #endif. #ifdef _LP64 ... #endif added > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5824: > >> 5822: __ enter(); // required for proper stackwalking of RuntimeStub frame >> 5823: >> 5824: __ vmovdqu(yshuf0, ExternalAddress((address) StubRoutines::x86::_adler32_shuf0_table)); > > For vmovdqu also it is good to be explicit with scratch register. added scratch register to vmovdqu ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 3 18:41:54 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 3 May 2021 18:41:54 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v3] In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 00:21:57 GMT, Sandhya Viswanathan wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> set flag UseAdler32Intrinsics differently to based on 64 or 32-bit modes. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5812: > >> 5810: StubCodeMark mark(this, "StubRoutines", "updateBytesAdler32"); >> 5811: >> 5812: address start = __ pc(); > > The algorithm part can go into macroAssembler_x86_adler.cpp with Intel copyright (see macroAssembler_x86_sha.cpp). macroAssembler_x86_adler.cpp added ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From vlivanov at openjdk.java.net Mon May 3 18:45:07 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 3 May 2021 18:45:07 GMT Subject: RFR: 8266074: Vtable-based CHA implementation [v4] In-Reply-To: References: Message-ID: > As of now, Class Hierarchy Analysis (CHA) employs an approximate algorithm to enumerate all non-abstract methods in a class hierarchy. > > It served quite well for many years, but it accumulated significant complexity > to support different corner cases over time and inevitable evolution of the JVM > stretched the whole approach way too much (to the point where it become almost > impossible to extend the analysis any further). > > It turns out the root problem is the decision to reimplement method resolution > and method selection logic from scratch and to perform it on JVM internal > representation. It makes it very hard to reason about correctness and the > implementation becomes sensitive to changes in internal representation. > > So, the main motivation for the redesign is twofold: > * reduce maintenance burden and increase confidence in the code; > * unlock some long-awaited enhancements. > > Though I did experiment with relaxing existing constraints (e.g., enable default method support), > any possible enhancements are deliberately kept out of scope for the current PR. > (It does deliver a bit of minor enhancements front as the changes in > compiler/cha/StrengthReduceInterfaceCall.java manifest, but it's a side effect > of the other changes and was not the goal of the current work.) > > Proposed implementation (`LinkedConcreteMethodFinder`) mimics method invocation > and relies on vtable/itable information to detect target method for every > subclass it visits. It removes all the complexity associated with method > resolution and method selection logic and leaves only essential logic to prepare for method selection. > > Vtables are filled during class linkage, so new logic doesn't work on not yet linked classed. > Instead of supporting not yet linked case, it is simply ignored. It is safe to > skip them (treat as "effectively non-concrete") since it is guaranteed there > are no instances created yet. But it requires VM to check dependencies once a > class is linked. > > I ended up with 2 separate dependency validation passes (when class is loaded > and when it is linked). To avoid duplicated work, only dependencies > which may be affected by class initialization state change > (`unique_concrete_method_4`) are visited. > > (I experimented with merging passes into a single pass (delay the pass until > linkage is over), but it severely affected other class-related dependencies and > relevant optimizations.code.) > > Compiler Interface (CI) is changed to require users to provide complete information about the call site being analyzed. > > Old implementation is kept intact for now (will be removed later) to: > - JVMCI hasn't been migrated to the new implementation yet; > - enable verification that 2 implementations (old and new) agree on the results; > - temporarily keep an option to revert to the original implementation in case any regressions show up. > > Testing: > - [x] hs-tier1 - hs-tier9 > - [x] hs-tier1 - hs-tier4 w/ `-XX:-UseVtableBasedCHA` > - [x] performance testing > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Cover abstract method case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3727/files - new: https://git.openjdk.java.net/jdk/pull/3727/files/11276e26..2ee8a779 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3727&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3727&range=02-03 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3727/head:pull/3727 PR: https://git.openjdk.java.net/jdk/pull/3727 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 3 18:45:19 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 3 May 2021 18:45:19 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v4] In-Reply-To: References: Message-ID: > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > For the following benchmark: > http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java > > The optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: changed copyright year to 2021 in macroAssembler_x86_adler.cpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/57ec0b8c..bc60f308 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+2249648+johntortugo at openjdk.java.net Mon May 3 19:56:47 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Mon, 3 May 2021 19:56:47 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v10] In-Reply-To: References: Message-ID: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 > Tested on: Linux tier1, 2 and 3 > > Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. John Tortugo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge pull request #3 from JohnTortugo/master Updating branch with latest changes in upstream/master - Merge pull request #2 from openjdk/master Fork updating - Encode cdql/cdqq using MacroAssembler. - Fix cast of constant - Use cdql/cdqq implemented in MacroAssembler. - All conversions performed and tested. - More shifts; logic operations and movs. - Some div and shift instructions. - Third part of conversions. Small fix in Assembler::cmovl. - Second part of conversions. - ... and 4 more: https://git.openjdk.java.net/jdk/compare/c9ae2fa3...53a5e32b ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2420/files - new: https://git.openjdk.java.net/jdk/pull/2420/files/92a242b0..53a5e32b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=08-09 Stats: 746030 lines in 10204 files changed: 162042 ins; 553344 del; 30644 mod Patch: https://git.openjdk.java.net/jdk/pull/2420.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2420/head:pull/2420 PR: https://git.openjdk.java.net/jdk/pull/2420 From psandoz at openjdk.java.net Mon May 3 21:43:54 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 3 May 2021 21:43:54 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2] In-Reply-To: References: Message-ID: On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan wrote: >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> This work is part of second round of incubation of the Vector API. >> JEP: https://bugs.openjdk.java.net/browse/JDK-8261663 >> >> Please review. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > - remove whitespace > - Merge master > - Small fix > - cleanup > - x86 short vector math optimization for Vector API Tier 1 to 3 tests pass for the default set of build profiles. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From kvn at openjdk.java.net Mon May 3 21:44:49 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 3 May 2021 21:44:49 GMT Subject: RFR: 8266449: cleanup jtreg tags in compiler/intrinsics/sha/cli tests In-Reply-To: References: Message-ID: On Mon, 3 May 2021 15:57:02 GMT, Igor Ignatyev wrote: > Hi all, > > could you please review this small cleanup of jtreg tags in compiler/intrinsics/sha/cli tests? > - `@modules` tags are not needed since we have changed the used test libraries not to depend on `java.management` module and `jdk.internal.misc.*` classes; > - `testcases` isn't required in `@library` b/c the files this directory contains are accessible from `/` `@library`. > > Thanks, > -- Igor Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3841 From sviswanathan at openjdk.java.net Mon May 3 22:07:54 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 3 May 2021 22:07:54 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v4] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 18:45:19 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > changed copyright year to 2021 in macroAssembler_x86_adler.cpp src/hotspot/cpu/x86/macroAssembler_x86_adler.cpp line 82: > 80: cmpptr(data, end); > 81: jcc(Assembler::aboveEqual, SKIP_LOOP_1A); > 82: align(32) is needed here. src/hotspot/cpu/x86/macroAssembler_x86_adler.cpp line 113: > 111: vpaddd(xa, xa, xtmp0, Assembler::AVX_256bit); > 112: vpaddd(xb, xb, xtmp1, Assembler::AVX_256bit); > 113: vpaddd(xsa, xsa, xtmp2, Assembler::AVX_256bit); Should Assembler::AVX_128bit here. src/hotspot/cpu/x86/macroAssembler_x86_adler.cpp line 179: > 177: movdl(rax, xb); > 178: addl(b_d, rax); > 179: align(32) is needed here. src/hotspot/cpu/x86/macroAssembler_x86_adler.cpp line 183: > 181: movzbl(rax, Address(data, 0)); //movzx eax, byte[data] > 182: addl(a_d, rax); > 183: incl(data); data is a pointer, incl(data) should be either incptr(data) or addptr(data, 1); ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From sviswanathan at openjdk.java.net Mon May 3 22:11:59 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 3 May 2021 22:11:59 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. @PaulSandoz Thanks a lot for running through the tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From hshi at openjdk.java.net Tue May 4 00:03:50 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Tue, 4 May 2021 00:03:50 GMT Subject: RFR: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs [v2] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 15:02:29 GMT, Tobias Hartmann wrote: >> Hui Shi has updated the pull request incrementally with one additional commit since the last revision: >> >> Use PhaseIterGVN ptr as LoadNode::eliminate_autobox method parameter for simiplification and add comments for previous commit > > Looks good to me too. @TobiHartmann Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/3818 From github.com+58006833+xbzhang99 at openjdk.java.net Tue May 4 00:53:17 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 4 May 2021 00:53:17 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v5] In-Reply-To: References: Message-ID: > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > For the following benchmark: > http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java > > The optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: added align; replace incl with addptr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/bc60f308..172d7c63 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=03-04 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From iignatyev at openjdk.java.net Tue May 4 04:50:57 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 4 May 2021 04:50:57 GMT Subject: RFR: 8266449: cleanup jtreg tags in compiler/intrinsics/sha/cli tests In-Reply-To: References: Message-ID: <6f6B3posCZ3rq1Q8thgGGBa2MrZkkh0W9w3SmRboGHg=.a2b8be5a-8260-4e2e-8df2-ab1dc3fad1db@github.com> On Mon, 3 May 2021 15:57:02 GMT, Igor Ignatyev wrote: > Hi all, > > could you please review this small cleanup of jtreg tags in compiler/intrinsics/sha/cli tests? > - `@modules` tags are not needed since we have changed the used test libraries not to depend on `java.management` module and `jdk.internal.misc.*` classes; > - `testcases` isn't required in `@library` b/c the files this directory contains are accessible from `/` `@library`. > > Thanks, > -- Igor thanks, Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/3841 From iignatyev at openjdk.java.net Tue May 4 04:50:57 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 4 May 2021 04:50:57 GMT Subject: Integrated: 8266449: cleanup jtreg tags in compiler/intrinsics/sha/cli tests In-Reply-To: References: Message-ID: <-4CJtApRlpzFE1mVmx_XNu77PqoYacZJHOI5UtiAdjw=.b02157bc-46c2-49d0-8522-093142c20cf2@github.com> On Mon, 3 May 2021 15:57:02 GMT, Igor Ignatyev wrote: > Hi all, > > could you please review this small cleanup of jtreg tags in compiler/intrinsics/sha/cli tests? > - `@modules` tags are not needed since we have changed the used test libraries not to depend on `java.management` module and `jdk.internal.misc.*` classes; > - `testcases` isn't required in `@library` b/c the files this directory contains are accessible from `/` `@library`. > > Thanks, > -- Igor This pull request has now been integrated. Changeset: cfdf4a7d Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/cfdf4a7de77ea662201a876551f52fc558bfdf84 Stats: 36 lines in 12 files changed: 0 ins; 23 del; 13 mod 8266449: cleanup jtreg tags in compiler/intrinsics/sha/cli tests Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3841 From jbhateja at openjdk.java.net Tue May 4 05:58:54 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 4 May 2021 05:58:54 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v5] In-Reply-To: References: Message-ID: On Tue, 4 May 2021 00:53:17 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > added align; replace incl with addptr src/hotspot/cpu/x86/assembler_x86.cpp line 7859: > 7857: void Assembler::vbroadcastf128(XMMRegister dst, Address src, int vector_len) { > 7858: assert(VM_Version::supports_avx(), ""); > 7859: assert(vector_len == AVX_256bit || vector_len == AVX_512bit, ""); If you expect vector length 512 bits then above assert should also check for evex mode. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3248: > 3246: > 3247: void MacroAssembler::vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg) { > 3248: assert((UseAVX > 0), "SSE mode requires address alignment 16 bytes"); Assert message needs re-composition. src/hotspot/cpu/x86/macroAssembler_x86_adler.cpp line 85: > 83: align(32); > 84: bind(SLOOP1A); > 85: vbroadcastf128(ydata, Address(data, 0), Assembler::AVX_256bit); Loop operates over integral data (double word), shouldn't it be safe to use double broadcasting instruction to save which domain switchover penalty (not sure if broadcasting will causes domain switch over though), but this is happening in the main vector loop so being conservative. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From thartmann at openjdk.java.net Tue May 4 06:42:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 4 May 2021 06:42:49 GMT Subject: RFR: 8266438: Compile::remove_useless_nodes does not remove opaque nodes In-Reply-To: References: Message-ID: On Mon, 3 May 2021 14:50:12 GMT, Tobias Hartmann wrote: > [JDK-8255026](https://bugs.openjdk.java.net/browse/JDK-8255026) refactored the code in `Compile::remove_useless_nodes` and as a result, useless nodes are no longer removed from the `_predicate_opaqs` list. Before the [change](https://github.com/openjdk/jdk/commit/27230fae#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaL382), the call to `remove_macro_node` took care of this: > https://github.com/openjdk/jdk/blob/194bceca3a4d13d4528b86359ee9d5eead3ce7ac/src/hotspot/share/opto/compile.hpp#L676-L684 > > But the new code only removes nodes from the `_macro_nodes` list. Useless nodes should be removed from the `_skeleton_predicate_opaqs` list as well. > > I've seen failures due to this with a change in Valhalla (where we call `remove_useless_nodes` more often) but not in mainline. I think this should still be fixed in mainline. > > Thanks, > Tobias Thanks for the review, Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/3840 From mdoerr at openjdk.java.net Tue May 4 07:59:52 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 4 May 2021 07:59:52 GMT Subject: Integrated: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind In-Reply-To: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> References: <2jy2CBKOYJx9mu017Cm4M2yQYWivb2kK1gCvrZAxD7Y=.a7d2d333-a630-4956-abc6-8d461417ff92@github.com> Message-ID: On Thu, 22 Apr 2021 18:58:28 GMT, Martin Doerr wrote: > PPC64 and s390 have DecodeN implementations which use a MachTemp input. When LCM hoists the DecodeN, the MachTemp nodes reside in the old block, but should get hoisted together with the DecodeN node. > Same is true for load Base input which exists on s390 for example. Unfortunately, that's just a platform specific MachNode which is not nicely recognizable in LCM. This pull request has now been integrated. Changeset: 8e071c4b Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/8e071c4b52e84fed5503271f051429c9740b34dd Stats: 15 lines in 1 file changed: 14 ins; 0 del; 1 mod 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind Reviewed-by: kvn, goetz ------------- PR: https://git.openjdk.java.net/jdk/pull/3637 From aph at openjdk.java.net Tue May 4 09:00:03 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 4 May 2021 09:00:03 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v5] In-Reply-To: References: Message-ID: On Tue, 4 May 2021 00:53:17 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > added align; replace incl with addptr So I'm wondering. With JEP 414, the Vector API, do we need to keep writing hand-carved assembly for these things? It would be very instructive to see how well we can do with Java code; and if the Vector API isn't good enough, we need to know that. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From thartmann at openjdk.java.net Tue May 4 09:46:01 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 4 May 2021 09:46:01 GMT Subject: Integrated: 8266438: Compile::remove_useless_nodes does not remove opaque nodes In-Reply-To: References: Message-ID: <64h8ha4yc4En1h3g3HPaqy5bXzEwN9kv7LXzFIHhjWM=.bbe98702-c8ee-412a-a044-84965361d983@github.com> On Mon, 3 May 2021 14:50:12 GMT, Tobias Hartmann wrote: > [JDK-8255026](https://bugs.openjdk.java.net/browse/JDK-8255026) refactored the code in `Compile::remove_useless_nodes` and as a result, useless nodes are no longer removed from the `_predicate_opaqs` list. Before the [change](https://github.com/openjdk/jdk/commit/27230fae#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaL382), the call to `remove_macro_node` took care of this: > https://github.com/openjdk/jdk/blob/194bceca3a4d13d4528b86359ee9d5eead3ce7ac/src/hotspot/share/opto/compile.hpp#L676-L684 > > But the new code only removes nodes from the `_macro_nodes` list. Useless nodes should be removed from the `_skeleton_predicate_opaqs` list as well. > > I've seen failures due to this with a change in Valhalla (where we call `remove_useless_nodes` more often) but not in mainline. I think this should still be fixed in mainline. > > Thanks, > Tobias This pull request has now been integrated. Changeset: b6519048 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/b65190483c824234b86e2e43cf85009d926713bf Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8266438: Compile::remove_useless_nodes does not remove opaque nodes Reviewed-by: vlivanov, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/3840 From jbhateja at openjdk.java.net Tue May 4 12:49:50 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 4 May 2021 12:49:50 GMT Subject: RFR: 8265126: unified handling for VectorMask object re-materialization during de-optimization (re-submit) [v2] In-Reply-To: References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: <6q0cjlTVE4L-iIlIZo2piRj1IcXR0y5Rvdso3YPaG-E=.5d1ee7c3-9c9f-408a-84cc-f5a67085d09e@github.com> On Mon, 3 May 2021 09:11:08 GMT, Vladimir Ivanov wrote: > Test results (hs-tier1 - hs-tier5) are clean. Hi @iwanowww I presume following tests mention in PR comments have been cover in hs-tier1-5. Vladimir Kozlov added a comment - 2021-04-13 09:16 Before pushing implementation someone from Oracle have to run mach5 testing up-to tier3 at least. ------------- PR: https://git.openjdk.java.net/jdk/pull/3721 From hshi at openjdk.java.net Tue May 4 13:01:55 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Tue, 4 May 2021 13:01:55 GMT Subject: Integrated: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs In-Reply-To: References: Message-ID: <_VgE8azkjkvP4VRug-Sb5TjYA8YNZFc5pHcHyltH2Kk=.a185af4d-867c-4450-b3d1-5a43f768bb7d@github.com> On Fri, 30 Apr 2021 14:17:23 GMT, Hui Shi wrote: > This patch fix failure exposed by JDK-8264649. > > compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 in Compile::check_no_dead_use assertion. > In LoadNode::eliminate_autobox, early "result" is dead after line 1450 but not added into PhaseGVN worklist for optimization. > Its out_cnt is 0. If it isn't removed, will trigger assertion in Compile::check_no_dead_use. > > > 1443 } else if (result->is_Add() && result->in(2)->is_Con() && > 1444 result->in(1)->Opcode() == Op_LShiftX && > 1445 result->in(1)->in(2) == phase->intcon(shift)) { > 1446 // We can't do general optimization: ((X<> Z ==> X + (Y>>Z) > 1447 // but for boxing cache access we know that X< 1448 // (there is range check) so we do this optimizatrion by hand here. > 1449 Node* add_con = new RShiftXNode(result->in(2), phase->intcon(shift)); > --- result before is dead and might not removed > 1450 result = new AddXNode(result->in(1)->in(1), phase->transform(add_con)); > 1451 } else > > > Detail analysis is in https://bugs.openjdk.java.net/browse/JDK-8265767 > > @mychris I have verified compiler/eliminateAutobox/TestIntBoxing.java on qemu, it failed with same assertion and now passes with this fix. Would you please help verify it on arm32 machine? > > Testing: > - Passed Tier1-3 on Linux x86_64, release and fastdebug build, default option and -XX:-TieredCompilation. > - compiler/eliminateAutobox/TestIntBoxing.java on arm32 release/fastdebug/slowdebug This pull request has now been integrated. Changeset: ee5bba0d Author: Hui Shi Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/ee5bba0dc4cc7c2bfe633c5a3fe731c6c37adb1d Stats: 22 lines in 2 files changed: 3 ins; 0 del; 19 mod 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3818 From rkennke at openjdk.java.net Tue May 4 15:31:05 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 4 May 2021 15:31:05 GMT Subject: RFR: 8266505: Cleanup LibraryCallKit::make_unsafe_address() Message-ID: The decorators argument to make_unsafe_address() is unused and can be removed. It's a leftover from when Shenandoah needed to resolve the target object there. Testing: - [x] hotspot_gc_shenandoah - [x] tier1 - [x] tier2 ------------- Commit messages: - 8266505: Cleanup LibraryCallKit::make_unsafe_address() Changes: https://git.openjdk.java.net/jdk/pull/3858/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3858&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266505 Stats: 11 lines in 3 files changed: 0 ins; 1 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/3858.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3858/head:pull/3858 PR: https://git.openjdk.java.net/jdk/pull/3858 From sviswanathan at openjdk.java.net Tue May 4 15:37:52 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 4 May 2021 15:37:52 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v5] In-Reply-To: References: Message-ID: On Tue, 4 May 2021 08:57:12 GMT, Andrew Haley wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> added align; replace incl with addptr > > So I'm wondering. With JEP 414, the Vector API, do we need to keep writing hand-carved assembly for these things? It would be very instructive to see how well we can do with Java code; and if the Vector API isn't good enough, we need to know that. @theRealAph Vector API is still incubator module and cannot be used to implement standard JRE till it is finalized and becomes a standard module. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From roland at openjdk.java.net Tue May 4 15:47:56 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 4 May 2021 15:47:56 GMT Subject: RFR: 8266505: Cleanup LibraryCallKit::make_unsafe_address() In-Reply-To: References: Message-ID: On Tue, 4 May 2021 15:23:20 GMT, Roman Kennke wrote: > The decorators argument to make_unsafe_address() is unused and can be removed. It's a leftover from when Shenandoah needed to resolve the target object there. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3858 From aph at openjdk.java.net Tue May 4 15:48:08 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 4 May 2021 15:48:08 GMT Subject: RFR: 8266499: Delete dead code in aarch64.ad Message-ID: <9KmCxyTg-fgtylJ_MFe_PP7PvxJVWBaEt87fbXPS4l4=.8d2d02a7-2fab-464a-91c5-7720f713f170@github.com> Just dead code, nothing to see here. I had to change a few `MacroAssembler` to `C2_MacroAssembler` in ad_encode.m4, which seems not to have been updated when 8241436 was committed. ------------- Commit messages: - 8266499: AArch64: Delete dead code in aarch64.ad Changes: https://git.openjdk.java.net/jdk/pull/3860/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3860&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266499 Stats: 54 lines in 2 files changed: 0 ins; 50 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/3860.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3860/head:pull/3860 PR: https://git.openjdk.java.net/jdk/pull/3860 From chagedorn at openjdk.java.net Tue May 4 15:53:26 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 4 May 2021 15:53:26 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v8] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Mon, 3 May 2021 17:42:51 GMT, Christian Hagedorn wrote: >> This RFE provides an IR test framework to perform regex-based checks on the C2 IR shape of test methods emitted by the VM flags `-XX:+PrintIdeal` and `-XX:+PrintOptoAssembly`. The framework can also be used for other non-IR matching (and non-compiler) tests by providing easy to use annotations for commonly used testing patterns and compiler control flags. >> >> The framework is based on the ideas of the currently present IR test framework in [Valhalla](https://github.com/openjdk/valhalla/blob/e9c78ce4fcfd01361c35883e0d68f9ae5a80d079/test/hotspot/jtreg/compiler/valhalla/inlinetypes/InlineTypeTest.java) (mainly implemented by @TobiHartmann) which is being used with great success. This new framework aims to replace the old one in Valhalla at some point. >> >> A detailed description about how this new IR test framework works and how it is used is provided in the [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) file and in the [Javadocs](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc/jdk/test/lib/hotspot/ir_framework/package-summary.html) written for the framework classes. >> >> To finish a first version of this framework for JDK 17, I decided to leave some improvement possibilities and ideas to be followed up on in additional RFEs. Some ideas are mentioned in "Future Work" in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) and were also created as subtasks of this RFE. >> >> Testing (also described in "Internal Framework Tests in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md)): >> There are various tests to verify the correctness of the test framework which can be found as JTreg tests in the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) folder. Additional testing was performed by converting all compiler Inline Types test of project Valhalla (done by @katyapav in [JDK-8263024](https://bugs.openjdk.java.net/browse/JDK-8263024)) that used the old framework to the new framework. This provided additional testing for the framework itself. We ran the converted tests with all the flag settings used in hs-tier1-9 and hs-precheckin-comp. For sanity checking, this was also done with a sample IR test in mainline. >> >> Some stats about the framework code added to [ir_framework](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework): >> >> - without the [Javadocs files](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc) : 60 changed files, 13212 insertions, 0 deletions. >> - without the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) and [examples](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/examples) folder: 40 files changed, 6781 insertions >> - comments: 2399 insertions (calculated with `git diff --cached !(tests|examples) | grep -c -E "(^[+-]\s*(/)?*)|(^[+-]\s*//)"`) >> - which leaves 4382 lines of code inserted >> >> Big thanks to: >> - @TobiHartmann for all his help by discussing the new framework and for providing insights from his IR test framework in Valhalla. >> - @katyapav for converting the Valhalla tests to use the new framework which found some harder to catch bugs in the framework and also some actual C2 bugs. >> - @iignatev for helping to simplify the framework usage with JTreg and with the framework internal VM calling structure. >> - and others who provided valuable feedback. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with four additional commits since the last revision: > > - Remove TestFramework: both runWithScenarios, both runWithHelperClasses, and one runWithFlags method to make interface cleaner/simpler, update internal tests accordingly > - Minor improvements, comment fixes, and test fixes > - Rename TestFrameworkPrepareFlags -> FlagVM and rename TestFrameworkExecution -> TestVM > - Apply review comments: Extract Test classes into own files, extract Flag and Test VM processes into own class, replace socket-based flag VM communication with file-based and clean up socket usage of test VM, fix useless processing of hotspot-pid files if no IR rules are applied As mentioned above, I moved the framework classes and tests and updated the packages. I also split the classes into subpackages to structure the code better. A summary of the package structure can be found in the updated README file in section 5. I put this PR on hold as I'm away until the end of the month. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Tue May 4 15:53:25 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 4 May 2021 15:53:25 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v9] In-Reply-To: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: > This RFE provides an IR test framework to perform regex-based checks on the C2 IR shape of test methods emitted by the VM flags `-XX:+PrintIdeal` and `-XX:+PrintOptoAssembly`. The framework can also be used for other non-IR matching (and non-compiler) tests by providing easy to use annotations for commonly used testing patterns and compiler control flags. > > The framework is based on the ideas of the currently present IR test framework in [Valhalla](https://github.com/openjdk/valhalla/blob/e9c78ce4fcfd01361c35883e0d68f9ae5a80d079/test/hotspot/jtreg/compiler/valhalla/inlinetypes/InlineTypeTest.java) (mainly implemented by @TobiHartmann) which is being used with great success. This new framework aims to replace the old one in Valhalla at some point. > > A detailed description about how this new IR test framework works and how it is used is provided in the [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) file and in the [Javadocs](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc/jdk/test/lib/hotspot/ir_framework/package-summary.html) written for the framework classes. > > To finish a first version of this framework for JDK 17, I decided to leave some improvement possibilities and ideas to be followed up on in additional RFEs. Some ideas are mentioned in "Future Work" in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) and were also created as subtasks of this RFE. > > Testing (also described in "Internal Framework Tests in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md)): > There are various tests to verify the correctness of the test framework which can be found as JTreg tests in the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) folder. Additional testing was performed by converting all compiler Inline Types test of project Valhalla (done by @katyapav in [JDK-8263024](https://bugs.openjdk.java.net/browse/JDK-8263024)) that used the old framework to the new framework. This provided additional testing for the framework itself. We ran the converted tests with all the flag settings used in hs-tier1-9 and hs-precheckin-comp. For sanity checking, this was also done with a sample IR test in mainline. > > Some stats about the framework code added to [ir_framework](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework): > > - without the [Javadocs files](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc) : 60 changed files, 13212 insertions, 0 deletions. > - without the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) and [examples](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/examples) folder: 40 files changed, 6781 insertions > - comments: 2399 insertions (calculated with `git diff --cached !(tests|examples) | grep -c -E "(^[+-]\s*(/)?*)|(^[+-]\s*//)"`) > - which leaves 4382 lines of code inserted > > Big thanks to: > - @TobiHartmann for all his help by discussing the new framework and for providing insights from his IR test framework in Valhalla. > - @katyapav for converting the Valhalla tests to use the new framework which found some harder to catch bugs in the framework and also some actual C2 bugs. > - @iignatev for helping to simplify the framework usage with JTreg and with the framework internal VM calling structure. > - and others who provided valuable feedback. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with three additional commits since the last revision: - Splitting classes into subpackages and updating README accordingly, fix bug with new line matching in lookbehind on Windows - Fix package names and fixing internal tests, examples and README file accordingly - Move framework to test/hotspot/jtreg/compiler/lib and tests to test/hotspot/jtreg/testlibrary_tests/compiler/lib/ir_framework ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3508/files - new: https://git.openjdk.java.net/jdk/pull/3508/files/d6c72ec6..4424e01f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3508&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3508&range=07-08 Stats: 2150 lines in 78 files changed: 1009 ins; 968 del; 173 mod Patch: https://git.openjdk.java.net/jdk/pull/3508.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3508/head:pull/3508 PR: https://git.openjdk.java.net/jdk/pull/3508 From aph at openjdk.java.net Tue May 4 15:58:59 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 4 May 2021 15:58:59 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v5] In-Reply-To: References: Message-ID: On Tue, 4 May 2021 08:57:12 GMT, Andrew Haley wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> added align; replace incl with addptr > > So I'm wondering. With JEP 414, the Vector API, do we need to keep writing hand-carved assembly for these things? It would be very instructive to see how well we can do with Java code; and if the Vector API isn't good enough, we need to know that. > @theRealAph Vector API is still incubator module and cannot be used to implement standard JRE till it is finalized and becomes a standard module. Sure, I get that, and I suppose if we're desperate to have a vectorized intrinsic for Adler32 _right now_ this PR must go ahead right now. However, I'd be delighted to see the end of exquisitely hand-carved but hard-to-understand and hard-to-maintain assembly language, and it would be an excellent test for the coverage of the Vector API. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Tue May 4 16:15:52 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 4 May 2021 16:15:52 GMT Subject: RFR: 8266499: Delete dead code in aarch64.ad In-Reply-To: <9KmCxyTg-fgtylJ_MFe_PP7PvxJVWBaEt87fbXPS4l4=.8d2d02a7-2fab-464a-91c5-7720f713f170@github.com> References: <9KmCxyTg-fgtylJ_MFe_PP7PvxJVWBaEt87fbXPS4l4=.8d2d02a7-2fab-464a-91c5-7720f713f170@github.com> Message-ID: On Tue, 4 May 2021 15:41:15 GMT, Andrew Haley wrote: > Just dead code, nothing to see here. > > I had to change a few `MacroAssembler` to `C2_MacroAssembler` in > ad_encode.m4, which seems not to have been updated when 8241436 was > committed. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3860 From kvn at openjdk.java.net Tue May 4 16:17:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 4 May 2021 16:17:56 GMT Subject: RFR: 8266505: Cleanup LibraryCallKit::make_unsafe_address() In-Reply-To: References: Message-ID: <7gfJedtRDjrAJlZTtGFCZ6WiIaozT05JMwy3WyTmnKA=.3404106a-0455-4d4d-8622-e42fe3f8ba10@github.com> On Tue, 4 May 2021 15:23:20 GMT, Roman Kennke wrote: > The decorators argument to make_unsafe_address() is unused and can be removed. It's a leftover from when Shenandoah needed to resolve the target object there. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3858 From neliasso at openjdk.java.net Tue May 4 17:14:51 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 4 May 2021 17:14:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v5] In-Reply-To: References: Message-ID: <_HtwNK-Y3HSeXpPVNZRs7lzfvHhUrGu8uvAgVf18qsA=.1c9096c3-96bc-4217-8ee7-88cf34415d87@github.com> On Tue, 4 May 2021 15:56:18 GMT, Andrew Haley wrote: > > @theRealAph Vector API is still incubator module and cannot be used to implement standard JRE till it is finalized and becomes a standard module. > > Sure, I get that, and I suppose if we're desperate to have a vectorized intrinsic for Adler32 _right now_ this PR must go ahead right now. However, I'd be delighted to see the end of exquisitely hand-carved but hard-to-understand and hard-to-maintain assembly language, and it would be an excellent test for the coverage of the Vector API. We already have Adler32 implemented on aarch - so I think it is very reasonable to add it on x64 too. But I am very sympathetic to your general point about big chunks of handcrafted assembly. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Tue May 4 19:17:51 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 4 May 2021 19:17:51 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v8] In-Reply-To: References: Message-ID: On Thu, 29 Apr 2021 20:39:11 GMT, John Tortugo wrote: >> I got a lot of failures on Windows only. Almost all test/hotspot/jtreg/serviceability/sa/ tests failed and others. >> Examples: >> >> serviceability/sa/TestSysProps.java >> >> # Internal Error (t:\\workspace\\open\\src\\hotspot\\share\\runtime\\stackValue.cpp:139), pid=7572, tid=26052 >> # assert(oopDesc::is_oop_or_null(val, false)) failed: bad oop found >> >> compiler/c2/Test6800154.java >> java.lang.InternalError: 9223372036854775807 / -9223372036854775808 failed: 65547 != 0 >> at compiler.c2.Test6800154.run(Test6800154.java:111) >> at compiler.c2.Test6800154.main(Test6800154.java:97) > > @vnkozlov - thank you so much for running the tests! The cause of the problems you reported were the last changes I made to the div/mod instructions. I fixed the code and ran all tests again on Linux, macOS, and Windows and they are looking good (jdk tier1, 2, 3, and hotspot_all. @JohnTortugo did you fix the last issue? Let me know when I should test it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From github.com+2249648+johntortugo at openjdk.java.net Tue May 4 19:43:07 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Tue, 4 May 2021 19:43:07 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v8] In-Reply-To: References: Message-ID: <8ewa0bij_IEceYMY_izCKm-odLdquRFse8EcoRQkdDc=.847ba359-6104-48ca-89de-18e833c66462@github.com> On Tue, 4 May 2021 19:15:26 GMT, Vladimir Kozlov wrote: >> @vnkozlov - thank you so much for running the tests! The cause of the problems you reported were the last changes I made to the div/mod instructions. I fixed the code and ran all tests again on Linux, macOS, and Windows and they are looking good (jdk tier1, 2, 3, and hotspot_all. > > @JohnTortugo did you fix the last issue? Let me know when I should test it. @vnkozlov - I've fixed the code, I'm finishing running some more tests. I'll push the new changes later today. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From sviswanathan at openjdk.java.net Tue May 4 22:15:59 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 4 May 2021 22:15:59 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2] In-Reply-To: References: Message-ID: On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > - remove whitespace > - Merge master > - Small fix > - cleanup > - x86 short vector math optimization for Vector API @vnkozlov @AlanBateman @rose00 Looking forward to your review and feedback. This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator](https://openjdk.java.net/jeps/414), in preparation for when targeted. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From vlivanov at openjdk.java.net Wed May 5 10:43:58 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 5 May 2021 10:43:58 GMT Subject: RFR: 8265126: unified handling for VectorMask object re-materialization during de-optimization (re-submit) [v2] In-Reply-To: References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: On Mon, 3 May 2021 09:11:08 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge http://github.com/openjdk/jdk into JDK-8265126 >> - 8265126:[REDO] unified handling for VectorMask object re-materialization during de-optimization > > Test results (hs-tier1 - hs-tier5) are clean. > Hi @iwanowww I presume following tests mention in PR comments have been cover in hs-tier1-5. Yes, that's the case. ------------- PR: https://git.openjdk.java.net/jdk/pull/3721 From yyang at openjdk.java.net Wed May 5 11:16:52 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 5 May 2021 11:16:52 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v3] In-Reply-To: References: Message-ID: <4hGopHpNOfBqzwMmI2zq9MOUxLKyLi2axV89Gc6TFHg=.d14c44b7-940c-47f8-8c15-a940ba00840b@github.com> On Wed, 28 Apr 2021 06:43:19 GMT, Yi Yang wrote: >> It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > rename; redundant reloading Can I have a second review of this PR? Thanks, Yang ------------- PR: https://git.openjdk.java.net/jdk/pull/3616 From roland at openjdk.java.net Wed May 5 11:52:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 5 May 2021 11:52:06 GMT Subject: RFR: 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr Message-ID: This is some refactoring in another attempt to fix JDK-6312651 (Compiler should only use verified interface types for optimization). Rather than propose the patch from: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-May/033803.html as a single big patch. I've been working on splitting it. The plan is to have this and another refactoring patch that include no change to the way interfaces are handled as preparation. Then only, in a third patch, interface support along the lines of the patch I proposed 2 years ago would be introduces. This patch changes the class hierarchy of types that C2 uses and introduces TypeInstKlassPtr/TypeAryKlassPtr that mirror TypeInstPtr/TypeAryPtr. The motivation for this is that a single: ciKlass* _klass; is no longer sufficient to describe a type (a set of interfaces must also be carried around). That's not possible with TypeKlassPtr. The meet methods for TypeInstPtr/TypeInstKlassPtr and TypeAryPtr/TypeAryKlassPtr are largely similar. I moved the most complicated logic in helper methods: meet_instptr() and meet_aryptr() (Thanks to Vladimir I for testing the refactoring patches) ------------- Commit messages: - whitespaces - instklassptr/aryklassptr Changes: https://git.openjdk.java.net/jdk/pull/3880/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3880&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266550 Stats: 1187 lines in 15 files changed: 762 ins; 242 del; 183 mod Patch: https://git.openjdk.java.net/jdk/pull/3880.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3880/head:pull/3880 PR: https://git.openjdk.java.net/jdk/pull/3880 From aph at openjdk.java.net Wed May 5 12:20:55 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 5 May 2021 12:20:55 GMT Subject: Integrated: 8266499: Delete dead code in aarch64.ad In-Reply-To: <9KmCxyTg-fgtylJ_MFe_PP7PvxJVWBaEt87fbXPS4l4=.8d2d02a7-2fab-464a-91c5-7720f713f170@github.com> References: <9KmCxyTg-fgtylJ_MFe_PP7PvxJVWBaEt87fbXPS4l4=.8d2d02a7-2fab-464a-91c5-7720f713f170@github.com> Message-ID: On Tue, 4 May 2021 15:41:15 GMT, Andrew Haley wrote: > Just dead code, nothing to see here. > > I had to change a few `MacroAssembler` to `C2_MacroAssembler` in > ad_encode.m4, which seems not to have been updated when 8241436 was > committed. This pull request has now been integrated. Changeset: ef0f6930 Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/ef0f693065eddd5c86b9e0fc52d57eafb0b1dc50 Stats: 54 lines in 2 files changed: 0 ins; 50 del; 4 mod 8266499: Delete dead code in aarch64.ad Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3860 From redestad at openjdk.java.net Wed May 5 15:20:11 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 5 May 2021 15:20:11 GMT Subject: RFR: 8266561: Remove Compile::_save_argument_registers Message-ID: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> Compile::_save_argument_registers is always false, so I suggest removing it. It was used in the past for certain stubs, but the last use appears to have been removed with [JDK-8136406](https://bugs.openjdk.java.net/browse/JDK-8136406) ------------- Commit messages: - Minor cleanups - C->save_argument_registers() is effectively always false and can be removed Changes: https://git.openjdk.java.net/jdk/pull/3884/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3884&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266561 Stats: 65 lines in 8 files changed: 0 ins; 33 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/3884.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3884/head:pull/3884 PR: https://git.openjdk.java.net/jdk/pull/3884 From shade at openjdk.java.net Wed May 5 16:52:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 5 May 2021 16:52:06 GMT Subject: RFR: JDK-8266573: Make sure blackholes are tagged for all JVMCI paths Message-ID: In post-integration https://github.com/openjdk/jdk/pull/2024#issuecomment-832834830, Tom says: "In this code, 6018336#diff-fa2433a762244542fec57f9d58dd3092bae74f354acf0ef33603a5f8306fd7daR995, the call to `CompilerOracle::tag_blackhole_if_possible` should be outside of the if. As written, methods won't be properly tagged for libgraal, only when used with the pure Java graal." I am doing this blindly, because Graal is already removed from the codebase with JEP 410. ------------- Commit messages: - Move the tag upwards Changes: https://git.openjdk.java.net/jdk/pull/3887/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3887&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266573 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3887.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3887/head:pull/3887 PR: https://git.openjdk.java.net/jdk/pull/3887 From never at openjdk.java.net Wed May 5 17:10:53 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 5 May 2021 17:10:53 GMT Subject: RFR: JDK-8266573: Make sure blackholes are tagged for all JVMCI paths In-Reply-To: References: Message-ID: On Wed, 5 May 2021 16:44:08 GMT, Aleksey Shipilev wrote: > In post-integration https://github.com/openjdk/jdk/pull/2024#issuecomment-832834830, Tom says: > "In this code, 6018336#diff-fa2433a762244542fec57f9d58dd3092bae74f354acf0ef33603a5f8306fd7daR995, the call to `CompilerOracle::tag_blackhole_if_possible` should be outside of the if. As written, methods won't be properly tagged for libgraal, only when used with the pure Java graal." > > I am doing this blindly, because Graal is already removed from the codebase with JEP 410. Sorry I screwed up reporting this in the other PR. ------------- Marked as reviewed by never (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3887 From sviswanathan at openjdk.java.net Wed May 5 18:05:50 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 5 May 2021 18:05:50 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms In-Reply-To: References: Message-ID: On Thu, 29 Apr 2021 23:53:58 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Currently, only 64-bit Linux is supported @xbzhang99 The patch looks good to me now. Thanks for making all the changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Wed May 5 18:43:53 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 5 May 2021 18:43:53 GMT Subject: RFR: 8266561: Remove Compile::_save_argument_registers In-Reply-To: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> References: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> Message-ID: <-56lQNTycCCV1ifACwF4LSVUwpIFyC4i_abxfvXPHwo=.4f278233-4370-47e5-866c-314dba383049@github.com> On Wed, 5 May 2021 14:37:18 GMT, Claes Redestad wrote: > Compile::_save_argument_registers is always false, so I suggest removing it. > > It was used in the past for certain stubs, but the last use appears to have been removed with [JDK-8136406](https://bugs.openjdk.java.net/browse/JDK-8136406) These are C2 changes and they look fine. I thought we may do more clean up but C1 may pass must_gc_arguments to `RuntimeStub::new_runtime_stub()`. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3884 From redestad at openjdk.java.net Wed May 5 18:55:52 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 5 May 2021 18:55:52 GMT Subject: RFR: 8266561: Remove Compile::_save_argument_registers In-Reply-To: <-56lQNTycCCV1ifACwF4LSVUwpIFyC4i_abxfvXPHwo=.4f278233-4370-47e5-866c-314dba383049@github.com> References: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> <-56lQNTycCCV1ifACwF4LSVUwpIFyC4i_abxfvXPHwo=.4f278233-4370-47e5-866c-314dba383049@github.com> Message-ID: On Wed, 5 May 2021 18:40:54 GMT, Vladimir Kozlov wrote: > These are C2 changes and they look fine. Thanks! > I thought we may do more clean up but C1 may pass must_gc_arguments to `RuntimeStub::new_runtime_stub()`. Yes, I started pulling at that thread, but it wasn't obvious if it would unravel nicely. And even if so it seemed more appropriate for a follow-up. ------------- PR: https://git.openjdk.java.net/jdk/pull/3884 From rkennke at openjdk.java.net Wed May 5 19:20:55 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 5 May 2021 19:20:55 GMT Subject: Integrated: 8266505: Cleanup LibraryCallKit::make_unsafe_address() In-Reply-To: References: Message-ID: On Tue, 4 May 2021 15:23:20 GMT, Roman Kennke wrote: > The decorators argument to make_unsafe_address() is unused and can be removed. It's a leftover from when Shenandoah needed to resolve the target object there. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 This pull request has now been integrated. Changeset: 9de62a45 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/9de62a454f2ff7da62ce13e8ea9009645af72c14 Stats: 11 lines in 3 files changed: 0 ins; 1 del; 10 mod 8266505: Cleanup LibraryCallKit::make_unsafe_address() Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3858 From hshi at openjdk.java.net Thu May 6 01:44:13 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Thu, 6 May 2021 01:44:13 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time Message-ID: Optimization for VerifyIterativeGVN, motiviation is running with -XX:+VerifyIterativeGVN is extremly slow. In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 92s. Detail data in JBS description. Optimizations includes: 1. Optimize redundant verfications in PhaseIterGVN::verify_step. Nodes might verified multiple times. Redundant verifications between full pass and _verify_window single node process. Redundant verifications between different nodes in _verify_window 2. Optimize def-use edge checking: Skip multiple checks for same x->n input edges. Skip redundant check in inner loop when counting how many x in n's input edges, skip current index. 3. Optimize field access Replace "n->in(j)" with "n->_in[j]", skipping unuseful assert when invoking Node::in(int index). Optimization#2/#3 decrease execution time and no other overhead. optimization#1 adds 3 fields in class Node in debug build, they can be squeezed into an "int/long" if needed. jint _igvn_verify_depth_cur; jint _igvn_verify_depth_prev; julong _igvn_verify_epoch; ------------- Commit messages: - 8266528: Optimize C2 VerifyIterativeGVN execution time Changes: https://git.openjdk.java.net/jdk/pull/3872/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3872&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266528 Stats: 117 lines in 4 files changed: 83 ins; 5 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/3872.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3872/head:pull/3872 PR: https://git.openjdk.java.net/jdk/pull/3872 From hshi at openjdk.java.net Thu May 6 01:44:13 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Thu, 6 May 2021 01:44:13 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time In-Reply-To: References: Message-ID: On Wed, 5 May 2021 07:30:46 GMT, Hui Shi wrote: > Optimization for VerifyIterativeGVN, motiviation is running with -XX:+VerifyIterativeGVN is extremly slow. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 92s. > Detail data in JBS description. > > Optimizations includes: > 1. Optimize redundant verfications in PhaseIterGVN::verify_step. Nodes might verified multiple times. > Redundant verifications between full pass and _verify_window single node process. > Redundant verifications between different nodes in _verify_window > > 2. Optimize def-use edge checking: > Skip multiple checks for same x->n input edges. > Skip redundant check in inner loop when counting how many x in n's input edges, skip current index. > > 3. Optimize field access > Replace "n->in(j)" with "n->_in[j]", skipping unuseful assert when invoking Node::in(int index). > > Optimization#2/#3 decrease execution time and no other overhead. > optimization#1 adds 3 fields in class Node in debug build, they can be squeezed into an "int/long" if needed. > > jint _igvn_verify_depth_cur; > jint _igvn_verify_depth_prev; > julong _igvn_verify_epoch; Not sure if adding fields in Node class debug builds in acceptable for verification purpose. Would appreciate for any comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/3872 From dongbo at openjdk.java.net Thu May 6 01:51:51 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 6 May 2021 01:51:51 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: On Mon, 26 Apr 2021 11:16:00 GMT, Dong Bo wrote: >> On aarch64, current implementations of vector reduce_add2I, reduce_max2I, reduce_min2I can be optimized with NEON pairwise instructions: >> >> >> ## reduce_add2I, before >> mov w10, v19.s[0] >> mov w2, v19.s[1] >> add w10, w0, w10 >> add w10, w10, w2 >> ## reduce_add2I, optimized >> addp v23.2s, v24.2s, v24.2s >> mov w10, v23.s[0] >> add w10, w10, w2 >> >> ## reduce_max2I, before >> dup v16.2d, v23.d[0] >> sminv s16, v16.4s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> ## reduce_max2I, optimized >> sminp v16.2s, v23.2s, v23.2s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> >> >> I don't expect this to change anything of SuperWord, vectorizing reductions of two integers is disabled by [1]. >> This is useful for VectorAPI, tested benchmarks in [2], performance can improve ~51% and ~8% for `Int64Vector.ADD` and `Int64Vector.MAX` respectively. >> >> >> Benchmark (size) Mode Cnt Score Error Units >> # optimized >> Int64Vector.ADDLanes 1024 thrpt 10 2492.123 ? 23.561 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1825.882 ? 5.261 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1921.028 ? 3.253 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1588.575 ? 3.903 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1923.913 ? 2.117 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1596.875 ? 2.163 ops/ms >> # default >> Int64Vector.ADDLanes 1024 thrpt 10 1644.223 ? 1.885 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1491.502 ? 26.436 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1784.066 ? 3.816 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1494.750 ? 3.451 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1785.266 ? 8.893 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1499.233 ? 3.498 ops/ms >> >> >> Verified correctness with tests `test/jdk/jdk/incubator/vector/`. Also tested linux-aarch64-server-fastdebug tier1-3. >> >> [1] https://github.com/openjdk/jdk/blob/3bf4c904fbbd87d4db18db22c1be384616483eed/src/hotspot/share/opto/superword.cpp#L2004 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/Int64Vector.java > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add assembler tests for smaxp/sminp Hi, @theRealAph. Could you please take a look at this PR? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From github.com+20216587+miao-zheng at openjdk.java.net Thu May 6 07:13:57 2021 From: github.com+20216587+miao-zheng at openjdk.java.net (Miao Zheng) Date: Thu, 6 May 2021 07:13:57 GMT Subject: Integrated: 8265915: adjust state_unloading_cycle compuation order in nmethod::is_unloading In-Reply-To: References: Message-ID: <0O0G9T--llg5N4ZgV-biMouITSQRjXX_92WBGMEidbQ=.a944c154-255f-4dfd-8559-1824b2e88c83@github.com> On Sun, 25 Apr 2021 11:07:14 GMT, Miao Zheng wrote: > Trivial change of moving state_unloading_cycle computation after state_is_unloading checking. Avoiding useless state_unloading_cycle computation when state_is_unloading is true. This pull request has now been integrated. Changeset: 7835cdbe Author: miao zheng Committer: John Jiang URL: https://git.openjdk.java.net/jdk/commit/7835cdbef4992bca3227a001bc58aa56dd72c3a5 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod 8265915: adjust state_unloading_cycle compuation order in nmethod::is_unloading Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3676 From thartmann at openjdk.java.net Thu May 6 07:57:27 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 6 May 2021 07:57:27 GMT Subject: RFR: 8266618: Remove broken -XX:-OptoRemoveUseless Message-ID: Similar to [JDK-8266542](https://bugs.openjdk.java.net/browse/JDK-8266542), simply running `java -XX:-OptoRemoveUseles`s already crashes the VM. I propose to remove this debug flag. Thanks, Tobias ------------- Commit messages: - 8266618: Remove broken -XX:-OptoRemoveUseless Changes: https://git.openjdk.java.net/jdk/pull/3896/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3896&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266618 Stats: 8 lines in 3 files changed: 0 ins; 6 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3896.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3896/head:pull/3896 PR: https://git.openjdk.java.net/jdk/pull/3896 From yyang at openjdk.java.net Thu May 6 10:14:16 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 6 May 2021 10:14:16 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v7] In-Reply-To: References: Message-ID: > The JDK codebase re-created many variants of checkIndex(`grep -I -r 'cehckIndex' jdk/`). A notable variant is java.nio.Buffer.checkIndex, which annotated with @IntrinsicCandidate and it only has a corresponding C1 intrinsic version. > > In fact, there is an utility method `jdk.internal.util.Preconditions.checkIndex`(wrapped by java.lang.Objects.checkIndex) that behaves the same as these variants of checkIndex, we can replace these re-created variants of checkIndex by Objects.checkIndex, it would significantly reduce duplicated code and enjoys performance improvement because Preconditions.checkIndex is @IntrinsicCandidate and it has a corresponding intrinsic method in HotSpot. > > But, the problem is currently HotSpot only implements the C2 version of Preconditions.checkIndex. To reuse it global-widely in JDK code, I think we can firstly implement its C1 counterpart. There are also a few kinds of stuff we can do later: > > 1. Replace all variants of checkIndex by Objects.checkIndex in the whole JDK codebase. > 2. Remove Buffer.checkIndex and obsolete/deprecate InlineNIOCheckIndex flag > > Testing: cds, compiler and jdk Yi Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - cmp clobbers its left argument on x86_32 - Merge branch 'master' into consolidate_checkindex - better check1-4 - AssertionError when expected exception was not thrown - remove extra newline - remove InlineNIOCheckIndex flag - remove java_nio_Buffer in javaClasses.hpp - consolidate ------------- Changes: https://git.openjdk.java.net/jdk/pull/3615/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3615&range=06 Stats: 331 lines in 11 files changed: 235 ins; 78 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/3615.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3615/head:pull/3615 PR: https://git.openjdk.java.net/jdk/pull/3615 From yyang at openjdk.java.net Thu May 6 11:17:18 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 6 May 2021 11:17:18 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v8] In-Reply-To: References: Message-ID: > The JDK codebase re-created many variants of checkIndex(`grep -I -r 'cehckIndex' jdk/`). A notable variant is java.nio.Buffer.checkIndex, which annotated with @IntrinsicCandidate and it only has a corresponding C1 intrinsic version. > > In fact, there is an utility method `jdk.internal.util.Preconditions.checkIndex`(wrapped by java.lang.Objects.checkIndex) that behaves the same as these variants of checkIndex, we can replace these re-created variants of checkIndex by Objects.checkIndex, it would significantly reduce duplicated code and enjoys performance improvement because Preconditions.checkIndex is @IntrinsicCandidate and it has a corresponding intrinsic method in HotSpot. > > But, the problem is currently HotSpot only implements the C2 version of Preconditions.checkIndex. To reuse it global-widely in JDK code, I think we can firstly implement its C1 counterpart. There are also a few kinds of stuff we can do later: > > 1. Replace all variants of checkIndex by Objects.checkIndex in the whole JDK codebase. > 2. Remove Buffer.checkIndex and obsolete/deprecate InlineNIOCheckIndex flag > > Testing: cds, compiler and jdk Yi Yang has updated the pull request incrementally with one additional commit since the last revision: build failed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3615/files - new: https://git.openjdk.java.net/jdk/pull/3615/files/e4959148..f996c99f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3615&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3615&range=06-07 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/3615.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3615/head:pull/3615 PR: https://git.openjdk.java.net/jdk/pull/3615 From thartmann at openjdk.java.net Thu May 6 13:00:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 6 May 2021 13:00:54 GMT Subject: RFR: 8266561: Remove Compile::_save_argument_registers In-Reply-To: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> References: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> Message-ID: <19yA7ODLruxTcwlBkY6aP4k-EPLg8XnFM5bx5RbPtFo=.dc1d123e-fcb9-4d9a-8e38-ea593bdfd0f6@github.com> On Wed, 5 May 2021 14:37:18 GMT, Claes Redestad wrote: > Compile::_save_argument_registers is always false, so I suggest removing it. > > It was used in the past for certain stubs, but the last use appears to have been removed with [JDK-8136406](https://bugs.openjdk.java.net/browse/JDK-8136406) Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3884 From redestad at openjdk.java.net Thu May 6 13:00:55 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 6 May 2021 13:00:55 GMT Subject: RFR: 8266561: Remove Compile::_save_argument_registers In-Reply-To: <-56lQNTycCCV1ifACwF4LSVUwpIFyC4i_abxfvXPHwo=.4f278233-4370-47e5-866c-314dba383049@github.com> References: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> <-56lQNTycCCV1ifACwF4LSVUwpIFyC4i_abxfvXPHwo=.4f278233-4370-47e5-866c-314dba383049@github.com> Message-ID: On Wed, 5 May 2021 18:40:54 GMT, Vladimir Kozlov wrote: >> Compile::_save_argument_registers is always false, so I suggest removing it. >> >> It was used in the past for certain stubs, but the last use appears to have been removed with [JDK-8136406](https://bugs.openjdk.java.net/browse/JDK-8136406) > > These are C2 changes and they look fine. > > I thought we may do more clean up but C1 may pass must_gc_arguments to `RuntimeStub::new_runtime_stub()`. @vnkozlov @TobiHartmann - thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/3884 From redestad at openjdk.java.net Thu May 6 13:00:55 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 6 May 2021 13:00:55 GMT Subject: Integrated: 8266561: Remove Compile::_save_argument_registers In-Reply-To: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> References: <32Comu0GA92wndESU2ocZ5sLCn-TlUSNulRINICLOWM=.e10d71b5-81e9-4c4d-84b5-5b4f77cdc479@github.com> Message-ID: On Wed, 5 May 2021 14:37:18 GMT, Claes Redestad wrote: > Compile::_save_argument_registers is always false, so I suggest removing it. > > It was used in the past for certain stubs, but the last use appears to have been removed with [JDK-8136406](https://bugs.openjdk.java.net/browse/JDK-8136406) This pull request has now been integrated. Changeset: c665dba5 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/c665dba591ae5c15c9ca49e14d1aaa4eea38e7ae Stats: 65 lines in 8 files changed: 0 ins; 33 del; 32 mod 8266561: Remove Compile::_save_argument_registers Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3884 From shade at openjdk.java.net Thu May 6 16:44:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 6 May 2021 16:44:56 GMT Subject: Integrated: JDK-8266573: Make sure blackholes are tagged for all JVMCI paths In-Reply-To: References: Message-ID: <4VfMQTRMb0FxzkZehlz3Da3Uopc_lgjnaffwm4n_bTU=.368e7ec6-eafd-4899-ab57-429f234625d0@github.com> On Wed, 5 May 2021 16:44:08 GMT, Aleksey Shipilev wrote: > In post-integration https://github.com/openjdk/jdk/pull/2024#issuecomment-832834830, Tom says: > "In this code, 6018336#diff-fa2433a762244542fec57f9d58dd3092bae74f354acf0ef33603a5f8306fd7daR995, the call to `CompilerOracle::tag_blackhole_if_possible` should be outside of the if. As written, methods won't be properly tagged for libgraal, only when used with the pure Java graal." > > I am doing this blindly, because Graal is already removed from the codebase with JEP 410. This pull request has now been integrated. Changeset: a90b33a9 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a90b33a95510a040fbb9a093ef5f3b6d4675dc9e Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod 8266573: Make sure blackholes are tagged for all JVMCI paths Reviewed-by: never ------------- PR: https://git.openjdk.java.net/jdk/pull/3887 From github.com+58006833+xbzhang99 at openjdk.java.net Thu May 6 17:12:37 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Thu, 6 May 2021 17:12:37 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v6] In-Reply-To: References: Message-ID: > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > For the following benchmark: > http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java > > The optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: remove checking AVX_512 in vbroadcastf128 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/172d7c63..72a0d3f3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From sviswanathan at openjdk.java.net Thu May 6 17:36:02 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 6 May 2021 17:36:02 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v6] In-Reply-To: References: Message-ID: <6qI3wSgRGjKl9kq1i-ryg4DkAo7Zl5JGlNCGsg4nsGk=.e4d141d0-42d2-4690-9555-ac07d8702029@github.com> On Thu, 6 May 2021 17:12:37 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove checking AVX_512 in vbroadcastf128 Marked as reviewed by sviswanathan (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Thu May 6 17:42:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 6 May 2021 17:42:56 GMT Subject: RFR: 8266618: Remove broken -XX:-OptoRemoveUseless In-Reply-To: References: Message-ID: On Thu, 6 May 2021 07:49:01 GMT, Tobias Hartmann wrote: > Similar to [JDK-8266542](https://bugs.openjdk.java.net/browse/JDK-8266542), simply running `java -XX:-OptoRemoveUseless` already crashes the VM. I propose to remove this debug flag. > > Thanks, > Tobias Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3896 From jbhateja at openjdk.java.net Fri May 7 05:39:55 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 7 May 2021 05:39:55 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v6] In-Reply-To: References: Message-ID: On Thu, 6 May 2021 17:12:37 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove checking AVX_512 in vbroadcastf128 Marked as reviewed by jbhateja (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From thartmann at openjdk.java.net Fri May 7 06:30:51 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 7 May 2021 06:30:51 GMT Subject: RFR: 8266618: Remove broken -XX:-OptoRemoveUseless In-Reply-To: References: Message-ID: On Thu, 6 May 2021 07:49:01 GMT, Tobias Hartmann wrote: > Similar to [JDK-8266542](https://bugs.openjdk.java.net/browse/JDK-8266542), simply running `java -XX:-OptoRemoveUseless` already crashes the VM. I propose to remove this debug flag. > > Thanks, > Tobias Thanks for the review, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/3896 From whuang at openjdk.java.net Fri May 7 09:03:14 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 7 May 2021 09:03:14 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: > It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : > * show the crash case `TestVectorShuffleIotaShort` > * solve the issue on `aarch64` and `x86` by adding the rule. > * test after fixing on tire1~3 > > Thank you for your review. Any suggestion is welcome. > Wang Huang Wang Huang has updated the pull request incrementally with one additional commit since the last revision: fix bugs ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3747/files - new: https://git.openjdk.java.net/jdk/pull/3747/files/3d11bb1a..d49029be Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=01-02 Stats: 183 lines in 6 files changed: 104 ins; 62 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/3747.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3747/head:pull/3747 PR: https://git.openjdk.java.net/jdk/pull/3747 From neliasso at openjdk.java.net Fri May 7 09:17:50 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 7 May 2021 09:17:50 GMT Subject: RFR: 8266618: Remove broken -XX:-OptoRemoveUseless In-Reply-To: References: Message-ID: On Thu, 6 May 2021 07:49:01 GMT, Tobias Hartmann wrote: > Similar to [JDK-8266542](https://bugs.openjdk.java.net/browse/JDK-8266542), simply running `java -XX:-OptoRemoveUseless` already crashes the VM. I propose to remove this debug flag. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3896 From neliasso at openjdk.java.net Fri May 7 09:29:51 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 7 May 2021 09:29:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v6] In-Reply-To: References: Message-ID: On Thu, 6 May 2021 17:12:37 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove checking AVX_512 in vbroadcastf128 I find this test for Adler32: test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java Will there be a benchmark added to micros? ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From thartmann at openjdk.java.net Fri May 7 09:58:51 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 7 May 2021 09:58:51 GMT Subject: RFR: 8266618: Remove broken -XX:-OptoRemoveUseless In-Reply-To: References: Message-ID: On Thu, 6 May 2021 07:49:01 GMT, Tobias Hartmann wrote: > Similar to [JDK-8266542](https://bugs.openjdk.java.net/browse/JDK-8266542), simply running `java -XX:-OptoRemoveUseless` already crashes the VM. I propose to remove this debug flag. > > Thanks, > Tobias Thanks, Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/3896 From thartmann at openjdk.java.net Fri May 7 09:58:52 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 7 May 2021 09:58:52 GMT Subject: Integrated: 8266618: Remove broken -XX:-OptoRemoveUseless In-Reply-To: References: Message-ID: <0eeIhq9h-ttxxrKQOy3U56AfydJSseitinSkrc0kPSs=.84d4627a-f1ed-4d7c-9d2a-6d963a44b2e2@github.com> On Thu, 6 May 2021 07:49:01 GMT, Tobias Hartmann wrote: > Similar to [JDK-8266542](https://bugs.openjdk.java.net/browse/JDK-8266542), simply running `java -XX:-OptoRemoveUseless` already crashes the VM. I propose to remove this debug flag. > > Thanks, > Tobias This pull request has now been integrated. Changeset: a65021e3 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/a65021e38c8f2be67be08475da67956a5a47e408 Stats: 8 lines in 3 files changed: 0 ins; 6 del; 2 mod 8266618: Remove broken -XX:-OptoRemoveUseless Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/3896 From jbhateja at openjdk.java.net Fri May 7 14:38:39 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 7 May 2021 14:38:39 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs Message-ID: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. 1) VectorMask.firstTrue. 2) VectorMask.lastTrue. 3) VectorMask.trueCount. Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. Intrinsification is not performed for vector species containing less than two vector lanes. Please find below the performance number for benchmark included in the patch: Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN -- | -- | -- | -- | -- | -- MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 ALGO (1=bestcase, 2=worstcast,3=avgcase) ------------- Commit messages: - 8256973: Removing white spaces to satisfy jcheck. - 8256973: Intrinsic creation for VectorMask query (lastTrue,firstTrue,trueCount) APIs Changes: https://git.openjdk.java.net/jdk/pull/3916/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256973 Stats: 1279 lines in 49 files changed: 1246 ins; 30 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/3916.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3916/head:pull/3916 PR: https://git.openjdk.java.net/jdk/pull/3916 From yyang at openjdk.java.net Fri May 7 14:50:19 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 7 May 2021 14:50:19 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block Message-ID: After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. (This patch actually does two things: 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw 2. `Cleanup unused Unsafe{Get,Put}Raw code` They are related so I put it together, but I still want to hear your suggestions, should they be separated into two patches, or just one patch is enough?) Thanks! Yang ------------- Commit messages: - cleanup Changes: https://git.openjdk.java.net/jdk/pull/3917/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3917&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266746 Stats: 543 lines in 14 files changed: 23 ins; 514 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/3917.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3917/head:pull/3917 PR: https://git.openjdk.java.net/jdk/pull/3917 From psandoz at openjdk.java.net Fri May 7 16:05:14 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 7 May 2021 16:05:14 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 7 May 2021 14:23:38 GMT, Jatin Bhateja wrote: > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) These mask operations can be considered a form of reduction. Do you think it makes sense to reuse `VectorSupport.reductionCoerced` instead of adding a new intrinsic? (Note that we reuse `VectorSupport.binaryOp` for mask logical binary operations). Perhaps that allows for further reuse later if/when we add operations to integral vectors to count bits like we already have with scalars, such as `Integer.bitCount`, `Integer.numberOfLeadingZeros` etc? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java line 147: > 145: > 146: /*package-private*/ > 147: static int trueCountHelper(boolean[] bits) { Naming-wise i think you can drop `Helper` from such methods. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From github.com+58006833+xbzhang99 at openjdk.java.net Fri May 7 16:42:30 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 7 May 2021 16:42:30 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v7] In-Reply-To: References: Message-ID: > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > For the following benchmark: > http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java > > The optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: Add @run case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/72a0d3f3..3851c602 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=05-06 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From jbhateja at openjdk.java.net Fri May 7 18:15:45 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 7 May 2021 18:15:45 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 7 May 2021 16:02:05 GMT, Paul Sandoz wrote: > These mask operations can be considered a form of reduction. > > Do you think it makes sense to reuse `VectorSupport.reductionCoerced` instead of adding a new intrinsic? (Note that we reuse `VectorSupport.binaryOp` for mask logical binary operations). > > Perhaps that allows for further reuse later if/when we add operations to integral vectors to count bits like we already have with scalars, such as `Integer.bitCount`, `Integer.numberOfLeadingZeros` etc? Hi @PaulSandoz , that's a nice suggestion, I think instead of reduction which may emit bulky sequence, VectorMask.toLong() + Long.bitCount() could have been used for trueCount. But since toLong may not work for ARM SVE, so in the mean time intrinsifying at the level of API looked reasonable. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java line 147: > >> 145: >> 146: /*package-private*/ >> 147: static int trueCountHelper(boolean[] bits) { > > Naming-wise i think you can drop `Helper` from such methods. This is indeed a Helper routine called from the lambda expression. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From github.com+58006833+xbzhang99 at openjdk.java.net Fri May 7 18:21:25 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 7 May 2021 18:21:25 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v7] In-Reply-To: References: Message-ID: On Fri, 7 May 2021 16:42:30 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add @run case Hi Nils, I modified TestAdler32.java by adding a @run case using -XX:+UseAdler32Intrinsics, and saw the run time shorted by 4.76x on my machine. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From sviswanathan at openjdk.java.net Fri May 7 18:21:26 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 7 May 2021 18:21:26 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v6] In-Reply-To: References: Message-ID: On Fri, 7 May 2021 09:26:30 GMT, Nils Eliasson wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> remove checking AVX_512 in vbroadcastf128 > > I find this test for Adler32: test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java > > Will there be a benchmark added to micros? @neliasso Could you please help review this patch. Jatin and I have looked at it and it looks good to us but we are not official reviewers. The algorithm is from https://github.com/intel/isa-l/blob/master/igzip/adler32_avx2_4.asm. Also the jmh micro http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java is from Pengfei Li @pfustc, shared as part of JDK-8216259 aarch64 optimization. Maybe Pengfei can contribute that to the OpenJDK. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Fri May 7 18:26:12 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 7 May 2021 18:26:12 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v7] In-Reply-To: References: Message-ID: <1JnKSpxAwilBL-ByTVm1i6pOkec0b_A4xb-s5LtTd8o=.2808324d-5362-4e75-b40d-35b8dfacc42e@github.com> On Fri, 7 May 2021 16:42:30 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add @run case The following tests have been run on both Linux and Windows: unit test of various buf sizes, 512, 1024, 2048, 4096, 8192, ... etc to compare the intrinsic vs base c++ impl. jmh - http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java jtreg - test-tier1 ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From jbhateja at openjdk.java.net Fri May 7 18:31:15 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 7 May 2021 18:31:15 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v4] In-Reply-To: References: Message-ID: > Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- > > vec1 = lanewise(VectorOperators.LSHL, n) > vec2 = lanewise(VectorOperators.LSHR, n) > res = lanewise(VectorOperations.OR, vec1 , vec2) > > This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. > > AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. > > Please find below the performance data for included JMH benchmark. > Machine: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade lake Server) > > `` > > Benchmark | (TESTSIZE) | Shift | Baseline (ops/ms) | Withopt (ops/ms) | Gain % > -- | -- | -- | -- | -- | -- > RotateBenchmark.testRotateLeftI | (size) | (shift) | 7384.747 | 7706.652 | 4.36 > RotateBenchmark.testRotateLeftI | 64 | 11 | 3723.305 | 3816.968 | 2.52 > RotateBenchmark.testRotateLeftI | 128 | 11 | 1811.521 | 1966.05 | 8.53 > RotateBenchmark.testRotateLeftI | 256 | 11 | 7133.296 | 7715.047 | 8.16 > RotateBenchmark.testRotateLeftI | 64 | 21 | 3612.144 | 3886.225 | 7.59 > RotateBenchmark.testRotateLeftI | 128 | 21 | 1815.422 | 1962.753 | 8.12 > RotateBenchmark.testRotateLeftI | 256 | 21 | 7216.353 | 7677.165 | 6.39 > RotateBenchmark.testRotateLeftI | 64 | 31 | 3602.008 | 3892.297 | 8.06 > RotateBenchmark.testRotateLeftI | 128 | 31 | 1882.163 | 1958.887 | 4.08 > RotateBenchmark.testRotateLeftI | 256 | 31 | 11819.443 | 11912.864 | 0.79 > RotateBenchmark.testRotateLeftI | 64 | 11 | 5978.475 | 6060.189 | 1.37 > RotateBenchmark.testRotateLeftI | 128 | 11 | 2965.179 | 3060.969 | 3.23 > RotateBenchmark.testRotateLeftI | 256 | 11 | 11479.579 | 11684.148 | 1.78 > RotateBenchmark.testRotateLeftI | 64 | 21 | 5904.903 | 6094.409 | 3.21 > RotateBenchmark.testRotateLeftI | 128 | 21 | 2969.879 | 3074.1 | 3.51 > RotateBenchmark.testRotateLeftI | 256 | 21 | 11531.654 | 12155.954 | 5.41 > RotateBenchmark.testRotateLeftI | 64 | 31 | 5730.918 | 6112.514 | 6.66 > RotateBenchmark.testRotateLeftI | 128 | 31 | 2937.19 | 2976.297 | 1.33 > RotateBenchmark.testRotateLeftI | 256 | 31 | 16159.184 | 16459.462 | 1.86 > RotateBenchmark.testRotateLeftI | 64 | 11 | 8154.982 | 8396.089 | 2.96 > RotateBenchmark.testRotateLeftI | 128 | 11 | 4142.224 | 4292.049 | 3.62 > RotateBenchmark.testRotateLeftI | 256 | 11 | 15958.154 | 16163.518 | 1.29 > RotateBenchmark.testRotateLeftI | 64 | 21 | 8098.805 | 8504.279 | 5.01 > RotateBenchmark.testRotateLeftI | 128 | 21 | 4137.598 | 4314.868 | 4.28 > RotateBenchmark.testRotateLeftI | 256 | 21 | 16201.666 | 15992.958 | -1.29 > RotateBenchmark.testRotateLeftI | 64 | 31 | 8027.169 | 8484.379 | 5.70 > RotateBenchmark.testRotateLeftI | 128 | 31 | 4146.29 | 4039.681 | -2.57 > RotateBenchmark.testRotateLeftL | 256 | 31 | 3566.176 | 3805.248 | 6.70 > RotateBenchmark.testRotateLeftL | 64 | 11 | 1820.219 | 1962.866 | 7.84 > RotateBenchmark.testRotateLeftL | 128 | 11 | 917.085 | 1007.334 | 9.84 > RotateBenchmark.testRotateLeftL | 256 | 11 | 3592.139 | 3973.698 | 10.62 > RotateBenchmark.testRotateLeftL | 64 | 21 | 1827.63 | 1999.711 | 9.42 > RotateBenchmark.testRotateLeftL | 128 | 21 | 907.104 | 1002.997 | 10.57 > RotateBenchmark.testRotateLeftL | 256 | 21 | 3780.962 | 3873.489 | 2.45 > RotateBenchmark.testRotateLeftL | 64 | 31 | 1830.121 | 1955.63 | 6.86 > RotateBenchmark.testRotateLeftL | 128 | 31 | 891.411 | 982.138 | 10.18 > RotateBenchmark.testRotateLeftL | 256 | 31 | 5890.544 | 6100.594 | 3.57 > RotateBenchmark.testRotateLeftL | 64 | 11 | 2984.329 | 3021.971 | 1.26 > RotateBenchmark.testRotateLeftL | 128 | 11 | 1485.109 | 1527.689 | 2.87 > RotateBenchmark.testRotateLeftL | 256 | 11 | 5903.411 | 6083.775 | 3.06 > RotateBenchmark.testRotateLeftL | 64 | 21 | 2925.37 | 3050.958 | 4.29 > RotateBenchmark.testRotateLeftL | 128 | 21 | 1486.432 | 1537.155 | 3.41 > RotateBenchmark.testRotateLeftL | 256 | 21 | 5853.721 | 6000.682 | 2.51 > RotateBenchmark.testRotateLeftL | 64 | 31 | 2896.116 | 3072.783 | 6.10 > RotateBenchmark.testRotateLeftL | 128 | 31 | 1483.132 | 1546.588 | 4.28 > RotateBenchmark.testRotateLeftL | 256 | 31 | 8059.206 | 8218.047 | 1.97 > RotateBenchmark.testRotateLeftL | 64 | 11 | 4022.416 | 4195.52 | 4.30 > RotateBenchmark.testRotateLeftL | 128 | 11 | 2084.296 | 2068.238 | -0.77 > RotateBenchmark.testRotateLeftL | 256 | 11 | 7971.832 | 8172.819 | 2.52 > RotateBenchmark.testRotateLeftL | 64 | 21 | 4032.036 | 4344.469 | 7.75 > RotateBenchmark.testRotateLeftL | 128 | 21 | 2068.957 | 2138.685 | 3.37 > RotateBenchmark.testRotateLeftL | 256 | 21 | 8140.63 | 8003.283 | -1.69 > RotateBenchmark.testRotateLeftL | 64 | 31 | 4088.621 | 4296.091 | 5.07 > RotateBenchmark.testRotateLeftL | 128 | 31 | 2007.753 | 2088.455 | 4.02 > RotateBenchmark.testRotateRightI | 256 | 31 | 7358.793 | 7548.976 | 2.58 > RotateBenchmark.testRotateRightI | 64 | 11 | 3648.868 | 3897.47 | 6.81 > RotateBenchmark.testRotateRightI | 128 | 11 | 1862.73 | 1969.964 | 5.76 > RotateBenchmark.testRotateRightI | 256 | 11 | 7268.806 | 7790.588 | 7.18 > RotateBenchmark.testRotateRightI | 64 | 21 | 3577.79 | 3979.675 | 11.23 > RotateBenchmark.testRotateRightI | 128 | 21 | 1773.243 | 1921.088 | 8.34 > RotateBenchmark.testRotateRightI | 256 | 21 | 7084.974 | 7609.912 | 7.41 > RotateBenchmark.testRotateRightI | 64 | 31 | 3688.781 | 3909.65 | 5.99 > RotateBenchmark.testRotateRightI | 128 | 31 | 1845.978 | 1928.316 | 4.46 > RotateBenchmark.testRotateRightI | 256 | 31 | 11463.228 | 12179.833 | 6.25 > RotateBenchmark.testRotateRightI | 64 | 11 | 5678.052 | 6028.573 | 6.17 > RotateBenchmark.testRotateRightI | 128 | 11 | 2990.419 | 3070.409 | 2.67 > RotateBenchmark.testRotateRightI | 256 | 11 | 11780.283 | 12105.261 | 2.76 > RotateBenchmark.testRotateRightI | 64 | 21 | 5827.8 | 6020.208 | 3.30 > RotateBenchmark.testRotateRightI | 128 | 21 | 2904.852 | 3047.154 | 4.90 > RotateBenchmark.testRotateRightI | 256 | 21 | 11359.146 | 12060.401 | 6.17 > RotateBenchmark.testRotateRightI | 64 | 31 | 5823.207 | 6079.82 | 4.41 > RotateBenchmark.testRotateRightI | 128 | 31 | 2984.484 | 3045.719 | 2.05 > RotateBenchmark.testRotateRightI | 256 | 31 | 16200.504 | 16376.475 | 1.09 > RotateBenchmark.testRotateRightI | 64 | 11 | 8118.399 | 8315.407 | 2.43 > RotateBenchmark.testRotateRightI | 128 | 11 | 4130.745 | 4092.588 | -0.92 > RotateBenchmark.testRotateRightI | 256 | 11 | 15842.168 | 16469.119 | 3.96 > RotateBenchmark.testRotateRightI | 64 | 21 | 7855.164 | 8188.913 | 4.25 > RotateBenchmark.testRotateRightI | 128 | 21 | 4114.378 | 4035.56 | -1.92 > RotateBenchmark.testRotateRightI | 256 | 21 | 15636.117 | 16289.632 | 4.18 > RotateBenchmark.testRotateRightI | 64 | 31 | 8108.067 | 7996.517 | -1.38 > RotateBenchmark.testRotateRightI | 128 | 31 | 3997.547 | 4153.58 | 3.90 > RotateBenchmark.testRotateRightL | 256 | 31 | 3685.99 | 3814.384 | 3.48 > RotateBenchmark.testRotateRightL | 64 | 11 | 1787.875 | 1916.541 | 7.20 > RotateBenchmark.testRotateRightL | 128 | 11 | 940.141 | 990.383 | 5.34 > RotateBenchmark.testRotateRightL | 256 | 11 | 3745.968 | 3920.667 | 4.66 > RotateBenchmark.testRotateRightL | 64 | 21 | 1877.94 | 1998.072 | 6.40 > RotateBenchmark.testRotateRightL | 128 | 21 | 933.536 | 1004.61 | 7.61 > RotateBenchmark.testRotateRightL | 256 | 21 | 3744.763 | 3947.427 | 5.41 > RotateBenchmark.testRotateRightL | 64 | 31 | 1864.818 | 1978.277 | 6.08 > RotateBenchmark.testRotateRightL | 128 | 31 | 906.965 | 998.692 | 10.11 > RotateBenchmark.testRotateRightL | 256 | 31 | 5910.469 | 6062.429 | 2.57 > RotateBenchmark.testRotateRightL | 64 | 11 | 2914.64 | 3033.127 | 4.07 > RotateBenchmark.testRotateRightL | 128 | 11 | 1491.344 | 1543.936 | 3.53 > RotateBenchmark.testRotateRightL | 256 | 11 | 5801.818 | 6098.892 | 5.12 > RotateBenchmark.testRotateRightL | 64 | 21 | 2881.328 | 3089.547 | 7.23 > RotateBenchmark.testRotateRightL | 128 | 21 | 1485.969 | 1526.231 | 2.71 > RotateBenchmark.testRotateRightL | 256 | 21 | 5783.495 | 5957.649 | 3.01 > RotateBenchmark.testRotateRightL | 64 | 31 | 3008.182 | 3026.323 | 0.60 > RotateBenchmark.testRotateRightL | 128 | 31 | 1464.566 | 1546.825 | 5.62 > RotateBenchmark.testRotateRightL | 256 | 31 | 8208.124 | 8361.437 | 1.87 > RotateBenchmark.testRotateRightL | 64 | 11 | 4062.465 | 4319.412 | 6.32 > RotateBenchmark.testRotateRightL | 128 | 11 | 2029.995 | 2086.497 | 2.78 > RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11 > RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47 > RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10 > RotateBenchmark.testRotateRightL | 256 | 21 | 8155.015 | 8175.792 | 0.25 > RotateBenchmark.testRotateRightL | 64 | 31 | 3960.629 | 4263.922 | 7.66 > RotateBenchmark.testRotateRightL | 128 | 31 | 1996.862 | 2055.486 | 2.94 > > `` Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266054: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3720/files - new: https://git.openjdk.java.net/jdk/pull/3720/files/f7945bff..8042aa23 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=02-03 Stats: 2651 lines in 39 files changed: 2328 ins; 10 del; 313 mod Patch: https://git.openjdk.java.net/jdk/pull/3720.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3720/head:pull/3720 PR: https://git.openjdk.java.net/jdk/pull/3720 From psandoz at openjdk.java.net Fri May 7 19:37:47 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 7 May 2021 19:37:47 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 7 May 2021 17:45:44 GMT, Jatin Bhateja wrote: > Hi @PaulSandoz , that's a nice suggestion, I think instead of reduction which may emit bulky sequence, VectorMask.toLong() + Long.bitCount() could have been used for trueCount. But since toLong may not work for ARM SVE, so in the mean time intrinsifying at the level of API looked reasonable. Do you mean that reusing `VectorSupport.reductionCoerced` as the intrinsic entry point may emit bulky sequence? Note that i was not suggesting to reuse `Long.bitCount()` etc. i was just using that as a example that the bit-wise reduction operations on masks can also apply to integral vectors, suggesting there might be some sharing in C2 just like is done for binary-wise operations, such as logical AND. For example: @Override @ForceInline public Int256Mask and(VectorMask mask) { Objects.requireNonNull(mask); Int256Mask m = (Int256Mask)mask; return VectorSupport.binaryOp(VECTOR_OP_AND, Int256Mask.class, int.class, VLENGTH, this, m, (m1, m2) -> m1.bOp(m2, (i, a, b) -> a & b)); } And notice that `VECTOR_OP_AND` is reused for vector lane-wise binary and reduction operations on `IntVector` etc. Can we do the same for other bitwise reduction-like operations, first implementing optimal support for masks, then later expanding for integral vectors? So rather than introducing specific constants, such as `VECTOR_OP_MASK_TRUECOUNT` etc, we can generalize to `VECTOR_OP_BITCOUNT` etc that can apply to both masks and integral vectors, where for masks we interpret `BIT` appropriately to mean `boolean` true value. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From dlong at openjdk.java.net Fri May 7 21:29:39 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 7 May 2021 21:29:39 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block In-Reply-To: References: Message-ID: On Fri, 7 May 2021 14:41:43 GMT, Yi Yang wrote: > After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. > > There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() > > https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 > > We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. > > (This patch actually does two things: > 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw > 2. `Cleanup unused Unsafe{Get,Put}Raw code` > They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) > > Thanks! > Yang Hi Yang. If the OptimizeUnsafes feature was ever useful for UnsafeGetRaw, then maybe we should apply that optimization to the UnsafeGetObject-with-null-object case rather than removing the feature. Could you try it? Also, I can't find where the UnsafeGetObject implementation handles the unaligned/wide flags of UnsafeGetRaw. It would be good to get this change reviewed by someone from GC, since UnsafeGetRaw uses the GC barrier interface to do its work. ------------- PR: https://git.openjdk.java.net/jdk/pull/3917 From psandoz at openjdk.java.net Fri May 7 21:33:53 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 7 May 2021 21:33:53 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v4] In-Reply-To: References: Message-ID: On Fri, 7 May 2021 18:31:15 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(VectorOperators.LSHR, n) >> res = lanewise(VectorOperations.OR, vec1 , vec2) >> >> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. >> >> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. >> >> Please find below the performance data for included JMH benchmark. >> Machine: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade lake Server) >> >> `` >> >> Benchmark | (TESTSIZE) | Shift | Baseline (ops/ms) | Withopt (ops/ms) | Gain % >> -- | -- | -- | -- | -- | -- >> RotateBenchmark.testRotateLeftI | (size) | (shift) | 7384.747 | 7706.652 | 4.36 >> RotateBenchmark.testRotateLeftI | 64 | 11 | 3723.305 | 3816.968 | 2.52 >> RotateBenchmark.testRotateLeftI | 128 | 11 | 1811.521 | 1966.05 | 8.53 >> RotateBenchmark.testRotateLeftI | 256 | 11 | 7133.296 | 7715.047 | 8.16 >> RotateBenchmark.testRotateLeftI | 64 | 21 | 3612.144 | 3886.225 | 7.59 >> RotateBenchmark.testRotateLeftI | 128 | 21 | 1815.422 | 1962.753 | 8.12 >> RotateBenchmark.testRotateLeftI | 256 | 21 | 7216.353 | 7677.165 | 6.39 >> RotateBenchmark.testRotateLeftI | 64 | 31 | 3602.008 | 3892.297 | 8.06 >> RotateBenchmark.testRotateLeftI | 128 | 31 | 1882.163 | 1958.887 | 4.08 >> RotateBenchmark.testRotateLeftI | 256 | 31 | 11819.443 | 11912.864 | 0.79 >> RotateBenchmark.testRotateLeftI | 64 | 11 | 5978.475 | 6060.189 | 1.37 >> RotateBenchmark.testRotateLeftI | 128 | 11 | 2965.179 | 3060.969 | 3.23 >> RotateBenchmark.testRotateLeftI | 256 | 11 | 11479.579 | 11684.148 | 1.78 >> RotateBenchmark.testRotateLeftI | 64 | 21 | 5904.903 | 6094.409 | 3.21 >> RotateBenchmark.testRotateLeftI | 128 | 21 | 2969.879 | 3074.1 | 3.51 >> RotateBenchmark.testRotateLeftI | 256 | 21 | 11531.654 | 12155.954 | 5.41 >> RotateBenchmark.testRotateLeftI | 64 | 31 | 5730.918 | 6112.514 | 6.66 >> RotateBenchmark.testRotateLeftI | 128 | 31 | 2937.19 | 2976.297 | 1.33 >> RotateBenchmark.testRotateLeftI | 256 | 31 | 16159.184 | 16459.462 | 1.86 >> RotateBenchmark.testRotateLeftI | 64 | 11 | 8154.982 | 8396.089 | 2.96 >> RotateBenchmark.testRotateLeftI | 128 | 11 | 4142.224 | 4292.049 | 3.62 >> RotateBenchmark.testRotateLeftI | 256 | 11 | 15958.154 | 16163.518 | 1.29 >> RotateBenchmark.testRotateLeftI | 64 | 21 | 8098.805 | 8504.279 | 5.01 >> RotateBenchmark.testRotateLeftI | 128 | 21 | 4137.598 | 4314.868 | 4.28 >> RotateBenchmark.testRotateLeftI | 256 | 21 | 16201.666 | 15992.958 | -1.29 >> RotateBenchmark.testRotateLeftI | 64 | 31 | 8027.169 | 8484.379 | 5.70 >> RotateBenchmark.testRotateLeftI | 128 | 31 | 4146.29 | 4039.681 | -2.57 >> RotateBenchmark.testRotateLeftL | 256 | 31 | 3566.176 | 3805.248 | 6.70 >> RotateBenchmark.testRotateLeftL | 64 | 11 | 1820.219 | 1962.866 | 7.84 >> RotateBenchmark.testRotateLeftL | 128 | 11 | 917.085 | 1007.334 | 9.84 >> RotateBenchmark.testRotateLeftL | 256 | 11 | 3592.139 | 3973.698 | 10.62 >> RotateBenchmark.testRotateLeftL | 64 | 21 | 1827.63 | 1999.711 | 9.42 >> RotateBenchmark.testRotateLeftL | 128 | 21 | 907.104 | 1002.997 | 10.57 >> RotateBenchmark.testRotateLeftL | 256 | 21 | 3780.962 | 3873.489 | 2.45 >> RotateBenchmark.testRotateLeftL | 64 | 31 | 1830.121 | 1955.63 | 6.86 >> RotateBenchmark.testRotateLeftL | 128 | 31 | 891.411 | 982.138 | 10.18 >> RotateBenchmark.testRotateLeftL | 256 | 31 | 5890.544 | 6100.594 | 3.57 >> RotateBenchmark.testRotateLeftL | 64 | 11 | 2984.329 | 3021.971 | 1.26 >> RotateBenchmark.testRotateLeftL | 128 | 11 | 1485.109 | 1527.689 | 2.87 >> RotateBenchmark.testRotateLeftL | 256 | 11 | 5903.411 | 6083.775 | 3.06 >> RotateBenchmark.testRotateLeftL | 64 | 21 | 2925.37 | 3050.958 | 4.29 >> RotateBenchmark.testRotateLeftL | 128 | 21 | 1486.432 | 1537.155 | 3.41 >> RotateBenchmark.testRotateLeftL | 256 | 21 | 5853.721 | 6000.682 | 2.51 >> RotateBenchmark.testRotateLeftL | 64 | 31 | 2896.116 | 3072.783 | 6.10 >> RotateBenchmark.testRotateLeftL | 128 | 31 | 1483.132 | 1546.588 | 4.28 >> RotateBenchmark.testRotateLeftL | 256 | 31 | 8059.206 | 8218.047 | 1.97 >> RotateBenchmark.testRotateLeftL | 64 | 11 | 4022.416 | 4195.52 | 4.30 >> RotateBenchmark.testRotateLeftL | 128 | 11 | 2084.296 | 2068.238 | -0.77 >> RotateBenchmark.testRotateLeftL | 256 | 11 | 7971.832 | 8172.819 | 2.52 >> RotateBenchmark.testRotateLeftL | 64 | 21 | 4032.036 | 4344.469 | 7.75 >> RotateBenchmark.testRotateLeftL | 128 | 21 | 2068.957 | 2138.685 | 3.37 >> RotateBenchmark.testRotateLeftL | 256 | 21 | 8140.63 | 8003.283 | -1.69 >> RotateBenchmark.testRotateLeftL | 64 | 31 | 4088.621 | 4296.091 | 5.07 >> RotateBenchmark.testRotateLeftL | 128 | 31 | 2007.753 | 2088.455 | 4.02 >> RotateBenchmark.testRotateRightI | 256 | 31 | 7358.793 | 7548.976 | 2.58 >> RotateBenchmark.testRotateRightI | 64 | 11 | 3648.868 | 3897.47 | 6.81 >> RotateBenchmark.testRotateRightI | 128 | 11 | 1862.73 | 1969.964 | 5.76 >> RotateBenchmark.testRotateRightI | 256 | 11 | 7268.806 | 7790.588 | 7.18 >> RotateBenchmark.testRotateRightI | 64 | 21 | 3577.79 | 3979.675 | 11.23 >> RotateBenchmark.testRotateRightI | 128 | 21 | 1773.243 | 1921.088 | 8.34 >> RotateBenchmark.testRotateRightI | 256 | 21 | 7084.974 | 7609.912 | 7.41 >> RotateBenchmark.testRotateRightI | 64 | 31 | 3688.781 | 3909.65 | 5.99 >> RotateBenchmark.testRotateRightI | 128 | 31 | 1845.978 | 1928.316 | 4.46 >> RotateBenchmark.testRotateRightI | 256 | 31 | 11463.228 | 12179.833 | 6.25 >> RotateBenchmark.testRotateRightI | 64 | 11 | 5678.052 | 6028.573 | 6.17 >> RotateBenchmark.testRotateRightI | 128 | 11 | 2990.419 | 3070.409 | 2.67 >> RotateBenchmark.testRotateRightI | 256 | 11 | 11780.283 | 12105.261 | 2.76 >> RotateBenchmark.testRotateRightI | 64 | 21 | 5827.8 | 6020.208 | 3.30 >> RotateBenchmark.testRotateRightI | 128 | 21 | 2904.852 | 3047.154 | 4.90 >> RotateBenchmark.testRotateRightI | 256 | 21 | 11359.146 | 12060.401 | 6.17 >> RotateBenchmark.testRotateRightI | 64 | 31 | 5823.207 | 6079.82 | 4.41 >> RotateBenchmark.testRotateRightI | 128 | 31 | 2984.484 | 3045.719 | 2.05 >> RotateBenchmark.testRotateRightI | 256 | 31 | 16200.504 | 16376.475 | 1.09 >> RotateBenchmark.testRotateRightI | 64 | 11 | 8118.399 | 8315.407 | 2.43 >> RotateBenchmark.testRotateRightI | 128 | 11 | 4130.745 | 4092.588 | -0.92 >> RotateBenchmark.testRotateRightI | 256 | 11 | 15842.168 | 16469.119 | 3.96 >> RotateBenchmark.testRotateRightI | 64 | 21 | 7855.164 | 8188.913 | 4.25 >> RotateBenchmark.testRotateRightI | 128 | 21 | 4114.378 | 4035.56 | -1.92 >> RotateBenchmark.testRotateRightI | 256 | 21 | 15636.117 | 16289.632 | 4.18 >> RotateBenchmark.testRotateRightI | 64 | 31 | 8108.067 | 7996.517 | -1.38 >> RotateBenchmark.testRotateRightI | 128 | 31 | 3997.547 | 4153.58 | 3.90 >> RotateBenchmark.testRotateRightL | 256 | 31 | 3685.99 | 3814.384 | 3.48 >> RotateBenchmark.testRotateRightL | 64 | 11 | 1787.875 | 1916.541 | 7.20 >> RotateBenchmark.testRotateRightL | 128 | 11 | 940.141 | 990.383 | 5.34 >> RotateBenchmark.testRotateRightL | 256 | 11 | 3745.968 | 3920.667 | 4.66 >> RotateBenchmark.testRotateRightL | 64 | 21 | 1877.94 | 1998.072 | 6.40 >> RotateBenchmark.testRotateRightL | 128 | 21 | 933.536 | 1004.61 | 7.61 >> RotateBenchmark.testRotateRightL | 256 | 21 | 3744.763 | 3947.427 | 5.41 >> RotateBenchmark.testRotateRightL | 64 | 31 | 1864.818 | 1978.277 | 6.08 >> RotateBenchmark.testRotateRightL | 128 | 31 | 906.965 | 998.692 | 10.11 >> RotateBenchmark.testRotateRightL | 256 | 31 | 5910.469 | 6062.429 | 2.57 >> RotateBenchmark.testRotateRightL | 64 | 11 | 2914.64 | 3033.127 | 4.07 >> RotateBenchmark.testRotateRightL | 128 | 11 | 1491.344 | 1543.936 | 3.53 >> RotateBenchmark.testRotateRightL | 256 | 11 | 5801.818 | 6098.892 | 5.12 >> RotateBenchmark.testRotateRightL | 64 | 21 | 2881.328 | 3089.547 | 7.23 >> RotateBenchmark.testRotateRightL | 128 | 21 | 1485.969 | 1526.231 | 2.71 >> RotateBenchmark.testRotateRightL | 256 | 21 | 5783.495 | 5957.649 | 3.01 >> RotateBenchmark.testRotateRightL | 64 | 31 | 3008.182 | 3026.323 | 0.60 >> RotateBenchmark.testRotateRightL | 128 | 31 | 1464.566 | 1546.825 | 5.62 >> RotateBenchmark.testRotateRightL | 256 | 31 | 8208.124 | 8361.437 | 1.87 >> RotateBenchmark.testRotateRightL | 64 | 11 | 4062.465 | 4319.412 | 6.32 >> RotateBenchmark.testRotateRightL | 128 | 11 | 2029.995 | 2086.497 | 2.78 >> RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11 >> RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47 >> RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10 >> RotateBenchmark.testRotateRightL | 256 | 21 | 8155.015 | 8175.792 | 0.25 >> RotateBenchmark.testRotateRightL | 64 | 31 | 3960.629 | 4263.922 | 7.66 >> RotateBenchmark.testRotateRightL | 128 | 31 | 1996.862 | 2055.486 | 2.94 >> >> `` > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266054: Review comments resolution. Java code updates look good I believe you can now remove the four "*-Rotate_*.template" files now that you leverage exiting templates? Also, i believe ancillary changes to `gen-template.sh` are no longer strictly required, now that we defer to method calls for ROL/ROR? ------------- PR: https://git.openjdk.java.net/jdk/pull/3720 From whuang at openjdk.java.net Sat May 8 02:43:23 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 02:43:23 GMT Subject: RFR: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota Message-ID: Dear All, Here is the patch of JDK-8266720. Could you do me a favor to review this? * Reproduce: * cherry-pick JDK-8265956 * run patch's `TestVectorShuffleIotaByteWrongImpl.java` * However, this wrong of this code is obvious. * Reason : 1. In interpreter: static int partiallyWrapIndex(int index, int laneCount) { return checkIndex0(index, laneCount, (byte)-1); } @ForceInline static int checkIndex0(int index, int laneCount, byte mode) { int wrapped = VectorIntrinsics.wrapToRange(index, laneCount); if (mode == 0 || wrapped == index) { // NOTE here return wrapped; } if (mode < 0) { return wrapped - laneCount; // special mode for internal storage } throw checkIndexFailed(index, laneCount); } @ForceInline static int wrapToRange(int index, int size) { if ((size & (size - 1)) == 0) { // Size is zero or a power of two, so we got this. return index & (size - 1); } else { return wrapToRangeNPOT(index, size); } } 2. However, we have this intrinsics in src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk] ```c++ 386 } else { 387 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here 388 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); 389 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); // here BoolTest::ge != 1 (which means BoolTest::gt) 390 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code: ```c++ // cond is useless here instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond) %{ predicate(n->as_Vector()->length() == 8 && n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge && n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE); match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); format %{ "cmge $dst, T8B, $src1, $src2\t# vector cmp (8B)" %} ins_cost(INSN_COST); ins_encode %{ __ cmge(as_FloatRegister($dst$$reg), __ T8B, as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg)); %} ins_pipe(vdop64); %} However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk ```c++ instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{ predicate(vector_length_in_bytes(n->in(1)->in(1)) >= 8 && // src1 vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1 is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1 match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); effect(TEMP scratch); format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %} ins_encode %{ int vlen_enc = vector_length_encoding(this, $src1); Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant); Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1)); __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register); %} ins_pipe( pipe_slow ); %} 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed: ```c++ 455 } else { 456 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here 457 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); 458 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); 459 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); Yours, Wang Huang ------------- Commit messages: - 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota Changes: https://git.openjdk.java.net/jdk/pull/3933/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3933&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266720 Stats: 58 lines in 2 files changed: 56 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3933.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3933/head:pull/3933 PR: https://git.openjdk.java.net/jdk/pull/3933 From yyang at openjdk.java.net Sat May 8 02:47:54 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Sat, 8 May 2021 02:47:54 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block In-Reply-To: References: Message-ID: On Fri, 7 May 2021 14:41:43 GMT, Yi Yang wrote: > After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. > > There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() > > https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 > > We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. > > (This patch actually does two things: > 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw > 2. `Cleanup unused Unsafe{Get,Put}Raw code` > They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) > > Thanks! > Yang Hi Dean, Thanks for noticing this PR. > If the OptimizeUnsafes feature was ever useful for UnsafeGetRaw, then maybe we should apply that optimization to the UnsafeGetObject-with-null-object case rather than removing the feature. Could you try it? Yes, I will investigate this and file an issue if it's possible to canonicalize UnsafeGetObject as well. > Also, I can't find where the UnsafeGetObject implementation handles the unaligned/wide flags of UnsafeGetRaw. I think access_load already handles the wide flag https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/gc/shared/c1/barrierSetC1.cpp#L182 Internally, move_wide will check the type and decide whether to execute the "wide" semantics according to the type, so using access_load for all types is ok. As for the unaligned flag, I found that it only affects PPC and s390 (which is why the pre-submit test passed), and access_load is not aware of this flag, I think they should really need to be handled. > It would be good to get this change reviewed by someone from GC, since UnsafeGetRaw uses the GC barrier interface to do its work. Yes, I'm looking forward to hearing suggestions from GC folks! ------------- PR: https://git.openjdk.java.net/jdk/pull/3917 From yyang at openjdk.java.net Sat May 8 03:00:14 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Sat, 8 May 2021 03:00:14 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v9] In-Reply-To: References: Message-ID: > The JDK codebase re-created many variants of checkIndex(`grep -I -r 'cehckIndex' jdk/`). A notable variant is java.nio.Buffer.checkIndex, which annotated with @IntrinsicCandidate and it only has a corresponding C1 intrinsic version. > > In fact, there is an utility method `jdk.internal.util.Preconditions.checkIndex`(wrapped by java.lang.Objects.checkIndex) that behaves the same as these variants of checkIndex, we can replace these re-created variants of checkIndex by Objects.checkIndex, it would significantly reduce duplicated code and enjoys performance improvement because Preconditions.checkIndex is @IntrinsicCandidate and it has a corresponding intrinsic method in HotSpot. > > But, the problem is currently HotSpot only implements the C2 version of Preconditions.checkIndex. To reuse it global-widely in JDK code, I think we can firstly implement its C1 counterpart. There are also a few kinds of stuff we can do later: > > 1. Replace all variants of checkIndex by Objects.checkIndex in the whole JDK codebase. > 2. Remove Buffer.checkIndex and obsolete/deprecate InlineNIOCheckIndex flag > > Testing: cds, compiler and jdk Yi Yang has updated the pull request incrementally with one additional commit since the last revision: x86_32 fails ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3615/files - new: https://git.openjdk.java.net/jdk/pull/3615/files/f996c99f..307d7a10 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3615&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3615&range=07-08 Stats: 9 lines in 1 file changed: 7 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3615.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3615/head:pull/3615 PR: https://git.openjdk.java.net/jdk/pull/3615 From whuang at openjdk.java.net Sat May 8 03:30:06 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 03:30:06 GMT Subject: RFR: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota [v2] In-Reply-To: References: Message-ID: > Dear All, > Here is the patch of JDK-8266720. Could you do me a favor to review this? > * Reproduce: > * cherry-pick JDK-8265956 > * run patch's `TestVectorShuffleIotaByteWrongImpl.java` > * However, this wrong of this code is obvious. > * Reason : > 1. In interpreter: > > static int partiallyWrapIndex(int index, int laneCount) { > return checkIndex0(index, laneCount, (byte)-1); > } > > @ForceInline > static int checkIndex0(int index, int laneCount, byte mode) { > int wrapped = VectorIntrinsics.wrapToRange(index, laneCount); > if (mode == 0 || wrapped == index) { // NOTE here > return wrapped; > } > if (mode < 0) { > return wrapped - laneCount; // special mode for internal storage > } > throw checkIndexFailed(index, laneCount); > } > > @ForceInline > static int wrapToRange(int index, int size) { > if ((size & (size - 1)) == 0) { > // Size is zero or a power of two, so we got this. > return index & (size - 1); > } else { > return wrapToRangeNPOT(index, size); > } > } > > 2. However, we have this intrinsics in > src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk] > ```c++ > 386 } else { > 387 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here > 388 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); > 389 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); > // here BoolTest::ge != 1 (which means BoolTest::gt) > 390 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); > > 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code: > ```c++ > // cond is useless here > instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond) > %{ > predicate(n->as_Vector()->length() == 8 && > n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge && > n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE); > match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); > format %{ "cmge $dst, T8B, $src1, $src2\t# vector cmp (8B)" %} > ins_cost(INSN_COST); > ins_encode %{ > __ cmge(as_FloatRegister($dst$$reg), __ T8B, > as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg)); > %} > ins_pipe(vdop64); > %} > > > However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk > ```c++ > instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{ > predicate(vector_length_in_bytes(n->in(1)->in(1)) >= 8 && // src1 > vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1 > is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1 > match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); > effect(TEMP scratch); > format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %} > ins_encode %{ > int vlen_enc = vector_length_encoding(this, $src1); > Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant); > Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1)); > __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register); > %} > ins_pipe( pipe_slow ); > %} > > 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed: > ```c++ > 455 } else { > 456 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here > 457 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); > 458 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); > 459 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); > > Yours, > Wang Huang Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3933/files - new: https://git.openjdk.java.net/jdk/pull/3933/files/65e20a50..9f5577f0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3933&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3933&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3933.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3933/head:pull/3933 PR: https://git.openjdk.java.net/jdk/pull/3933 From xgong at openjdk.java.net Sat May 8 03:56:03 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Sat, 8 May 2021 03:56:03 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Fri, 7 May 2021 09:03:14 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * solve the issue on `aarch64` and `x86` by adding the rule. >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix bugs src/hotspot/share/opto/vectorIntrinsics.cpp line 372: > 370: } else if (step_val->get_con() > 1) { > 371: Node* cnt = gvn().makecon(TypeInt::make(log2i_exact(step_val->get_con()))); > 372: Node* shift_cnt = gvn().transform(new LShiftCntVNode(cnt, vt)); Use "vector_shift_count"? It will mask the shift count before generating the `LShiftCntVNode`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From xgong at openjdk.java.net Sat May 8 04:51:23 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Sat, 8 May 2021 04:51:23 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Fri, 7 May 2021 09:03:14 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * solve the issue on `aarch64` and `x86` by adding the rule. >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix bugs Please update the copyright year of `"vectorIntrinsics.cpp"` to 2021. Thanks! src/hotspot/cpu/x86/x86.ad line 5864: > 5862: match(Set dst (LShiftVB src (LShiftCntV shift))); > 5863: match(Set dst (RShiftVB src (RShiftCntV shift))); > 5864: match(Set dst (URShiftVB src (RShiftCntV shift))); Regarding to this issue, it's no need to add these rules here. There is the vector shift rules already. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From xgong at openjdk.java.net Sat May 8 04:51:24 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Sat, 8 May 2021 04:51:24 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 04:47:09 GMT, Xiaohong Gong wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bugs > > src/hotspot/cpu/x86/x86.ad line 5864: > >> 5862: match(Set dst (LShiftVB src (LShiftCntV shift))); >> 5863: match(Set dst (RShiftVB src (RShiftCntV shift))); >> 5864: match(Set dst (URShiftVB src (RShiftCntV shift))); > > Regarding to this issue, it's no need to add these rules here. There is the vector shift rules already. The same to all other rules added in this file. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From yyang at openjdk.java.net Sat May 8 05:36:17 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Sat, 8 May 2021 05:36:17 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v6] In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 19:20:54 GMT, Igor Veresov wrote: > Looks like now the test fails in the pre-submit tests? Hi Igor, Can you take a look at the latest version? Now it passes all pre-submit tests. Best Regards, Yang ------------- PR: https://git.openjdk.java.net/jdk/pull/3615 From yyang at openjdk.java.net Sat May 8 05:36:17 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Sat, 8 May 2021 05:36:17 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v6] In-Reply-To: <9Z_DkUjmqefCjf9mvecHUtoLHhw1qGNWJPxufuwvXI0=.36498a86-d09f-4eea-ab89-74844dd862cf@github.com> References: <9Z_DkUjmqefCjf9mvecHUtoLHhw1qGNWJPxufuwvXI0=.36498a86-d09f-4eea-ab89-74844dd862cf@github.com> Message-ID: <4IY0_Zr94l_aZTe-fYIva28aZw8uYJ5k6d48uByI70E=.19f2b9e5-4958-4bb3-b016-d9f809fe3347@github.com> On Fri, 30 Apr 2021 17:30:33 GMT, Paul Sandoz wrote: > It was my hope this would eventually happen when we added `Objects.checkIndex` and the underlying intrinsic. Very good! Hi Paul, Thank you for noticing this PR. > It was my hope this would eventually happen when we added `Objects.checkIndex` and the underlying intrinsic. Yes, this patch adds C1 intrinsic supports for checkIndex, I will replace all variants of checkIndex with Objects.checkIndex in follow-up PRs. It seems that Object.checkIndex can not meet our needs because it implicitly passes null to Preconditions.checkIndex, but we want to customize exception messages, so we might add extra APIs in Objects while doing the replacement. > Very good! Thank you Paul~ Best Regards, Yang ------------- PR: https://git.openjdk.java.net/jdk/pull/3615 From jbhateja at openjdk.java.net Sat May 8 05:54:41 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 8 May 2021 05:54:41 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v4] In-Reply-To: References: Message-ID: On Fri, 7 May 2021 18:31:15 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(VectorOperators.LSHR, n) >> res = lanewise(VectorOperations.OR, vec1 , vec2) >> >> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. >> >> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. >> >> Please find below the performance data for included JMH benchmark. >> Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> Benchmark | (TESTSIZE) | Shift | Baseline AVX3 (ops/ms) | Withopt? AVX3 (ops/ms) | Gain % | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain % >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> ? | ? | ? | ? | ? | ? | ? | ? | ? >> RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 17223.35 | 17094.69 | -0.75 | 17008.32 | 17488.06 | 2.82 >> RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 8944.98 | 8811.34 | -1.49 | 8878.17 | 9218.68 | 3.84 >> RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 17195.75 | 17137.32 | -0.34 | 16789.01 | 17780.34 | 5.90 >> RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 9052.67 | 8838.60 | -2.36 | 8814.62 | 9206.01 | 4.44 >> RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 17100.19 | 16950.64 | -0.87 | 16827.73 | 17720.37 | 5.30 >> RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 9079.95 | 8471.26 | -6.70 | 8888.44 | 9167.68 | 3.14 >> RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 21231.33 | 21513.08 | 1.33 | 21824.51 | 21479.48 | -1.58 >> RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 11103.62 | 11180.16 | 0.69 | 11173.67 | 11529.22 | 3.18 >> RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 21119.14 | 21552.04 | 2.05 | 21693.05 | 21915.37 | 1.02 >> RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 11048.68 | 11094.20 | 0.41 | 11049.90 | 11439.07 | 3.52 >> RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 21506.31 | 21391.41 | -0.53 | 21263.18 | 21986.29 | 3.40 >> RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 11056.12 | 11232.78 | 1.60 | 10941.59 | 11397.09 | 4.16 >> RotateBenchmark.testRotateLeftB | 512.00 | 7.00 | 17976.56 | 18180.85 | 1.14 | 1212.26 | 2533.34 | 108.98 >> RotateBenchmark.testRotateLeftB | 512.00 | 15.00 | 17553.70 | 18219.07 | 3.79 | 1256.73 | 2537.41 | 101.91 >> RotateBenchmark.testRotateLeftB | 512.00 | 31.00 | 17618.03 | 17738.15 | 0.68 | 1214.69 | 2533.83 | 108.60 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 7258.87 | 7468.88 | 2.89 | 7115.12 | 7117.26 | 0.03 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 3586.65 | 3950.85 | 10.15 | 3532.17 | 3595.80 | 1.80 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 1835.07 | 1999.68 | 8.97 | 1789.90 | 1819.93 | 1.68 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 7273.36 | 7410.91 | 1.89 | 7198.60 | 6994.79 | -2.83 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 3674.98 | 3926.27 | 6.84 | 3549.90 | 3755.09 | 5.78 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 1840.94 | 1882.25 | 2.24 | 1801.56 | 1872.89 | 3.96 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 7457.11 | 7361.48 | -1.28 | 6975.33 | 7385.94 | 5.89 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 3570.74 | 3929.30 | 10.04 | 3635.37 | 3736.67 | 2.79 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 1902.32 | 1960.46 | 3.06 | 1812.32 | 1813.88 | 0.09 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 11174.24 | 12044.52 | 7.79 | 11509.87 | 11273.44 | -2.05 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 5981.47 | 6073.70 | 1.54 | 5593.66 | 5661.93 | 1.22 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 2932.49 | 3069.54 | 4.67 | 2950.86 | 2892.42 | -1.98 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 11764.11 | 12098.63 | 2.84 | 11069.52 | 11476.93 | 3.68 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 5855.20 | 6080.40 | 3.85 | 5919.11 | 5607.04 | -5.27 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 2989.05 | 3048.56 | 1.99 | 2902.63 | 2821.83 | -2.78 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 11652.84 | 11965.40 | 2.68 | 11525.62 | 11459.83 | -0.57 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 5851.82 | 6164.94 | 5.35 | 5882.60 | 5842.30 | -0.69 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 3015.99 | 3043.79 | 0.92 | 2963.71 | 2947.97 | -0.53 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 16029.15 | 16189.79 | 1.00 | 860.43 | 2339.32 | 171.88 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 8078.25 | 8081.84 | 0.04 | 427.39 | 1147.92 | 168.59 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 4021.49 | 4294.03 | 6.78 | 209.25 | 582.28 | 178.27 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 15912.98 | 16329.03 | 2.61 | 848.23 | 2296.78 | 170.77 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 8054.10 | 8306.37 | 3.13 | 429.93 | 1146.90 | 166.77 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 4102.58 | 4071.08 | -0.77 | 217.86 | 582.20 | 167.24 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 16177.79 | 16287.85 | 0.68 | 857.84 | 2243.15 | 161.49 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 8187.47 | 8410.48 | 2.72 | 434.60 | 1128.20 | 159.60 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 4109.15 | 4233.80 | 3.03 | 208.71 | 572.43 | 174.27 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 3755.09 | 3930.29 | 4.67 | 3604.19 | 3598.47 | -0.16 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 1829.03 | 1957.39 | 7.02 | 1833.95 | 1808.38 | -1.39 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 915.35 | 970.55 | 6.03 | 916.25 | 899.08 | -1.87 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 3664.85 | 3812.26 | 4.02 | 3629.37 | 3579.23 | -1.38 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 1829.51 | 1877.76 | 2.64 | 1781.05 | 1807.57 | 1.49 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 913.37 | 953.42 | 4.38 | 912.26 | 908.73 | -0.39 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 3648.45 | 3899.20 | 6.87 | 3552.67 | 3581.04 | 0.80 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 1816.50 | 1959.68 | 7.88 | 1820.88 | 1819.71 | -0.06 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 901.05 | 955.13 | 6.00 | 913.74 | 907.90 | -0.64 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 5850.99 | 6108.64 | 4.40 | 5882.65 | 5755.21 | -2.17 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 2962.21 | 3060.47 | 3.32 | 2955.20 | 2909.18 | -1.56 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 1480.46 | 1534.72 | 3.66 | 1467.78 | 1430.60 | -2.53 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 5858.23 | 6047.51 | 3.23 | 5770.02 | 5773.19 | 0.05 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 2951.49 | 3096.53 | 4.91 | 2885.21 | 2899.31 | 0.49 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 1486.26 | 1527.94 | 2.80 | 1441.93 | 1454.25 | 0.85 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 5873.21 | 6089.75 | 3.69 | 5767.58 | 5664.11 | -1.79 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 2969.67 | 3081.39 | 3.76 | 2878.50 | 2905.86 | 0.95 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 1452.21 | 1520.03 | 4.67 | 1430.30 | 1485.63 | 3.87 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 8088.65 | 8443.63 | 4.39 | 455.67 | 1226.33 | 169.13 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 4011.95 | 4120.25 | 2.70 | 229.77 | 619.87 | 169.77 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 2090.57 | 2109.53 | 0.91 | 115.21 | 310.36 | 169.37 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 8166.84 | 8557.28 | 4.78 | 457.67 | 1242.86 | 171.56 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 4137.02 | 4287.95 | 3.65 | 227.26 | 624.80 | 174.93 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 2095.01 | 2102.86 | 0.37 | 114.26 | 310.83 | 172.03 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 8082.68 | 8400.56 | 3.93 | 459.59 | 1230.07 | 167.64 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 4047.67 | 4147.58 | 2.47 | 229.01 | 606.38 | 164.78 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 2086.83 | 2126.72 | 1.91 | 111.93 | 305.66 | 173.08 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 13597.19 | 13255.09 | -2.52 | 13818.39 | 13242.40 | -4.17 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 7028.26 | 6826.59 | -2.87 | 6765.15 | 6907.87 | 2.11 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 3570.40 | 3468.01 | -2.87 | 3449.66 | 3533.50 | 2.43 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 13615.99 | 13464.40 | -1.11 | 13330.02 | 13870.57 | 4.06 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 7043.31 | 6763.34 | -3.97 | 6928.88 | 7063.57 | 1.94 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 3495.12 | 3537.62 | 1.22 | 3503.41 | 3457.67 | -1.31 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 13591.66 | 13665.84 | 0.55 | 13773.27 | 13126.08 | -4.70 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 7027.08 | 7011.24 | -0.23 | 6974.98 | 6815.50 | -2.29 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 3568.28 | 3569.62 | 0.04 | 3580.67 | 3463.58 | -3.27 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 21154.03 | 21416.32 | 1.24 | 21187.01 | 21401.61 | 1.01 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 11194.24 | 10865.47 | -2.94 | 11063.19 | 10977.60 | -0.77 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 5797.80 | 5523.94 | -4.72 | 5654.63 | 5468.78 | -3.29 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 21333.89 | 21412.74 | 0.37 | 21610.94 | 20908.96 | -3.25 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 11327.07 | 11113.48 | -1.89 | 11148.25 | 10678.14 | -4.22 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 5810.69 | 5569.72 | -4.15 | 5663.26 | 5618.87 | -0.78 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 21753.20 | 21198.43 | -2.55 | 21567.90 | 21929.81 | 1.68 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 11517.08 | 11039.64 | -4.15 | 11103.08 | 10871.59 | -2.08 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 5897.16 | 5606.75 | -4.92 | 5459.87 | 5604.12 | 2.64 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 29748.53 | 28883.73 | -2.91 | 1549.02 | 3928.53 | 153.61 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 15197.09 | 15878.19 | 4.48 | 772.59 | 1924.35 | 149.08 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 8046.30 | 8081.19 | 0.43 | 388.11 | 990.28 | 155.16 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 30618.04 | 29419.19 | -3.92 | 1524.22 | 3915.97 | 156.92 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 15854.43 | 15846.37 | -0.05 | 766.09 | 1953.60 | 155.01 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 7814.77 | 7899.30 | 1.08 | 390.82 | 970.37 | 148.29 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 29596.82 | 28538.69 | -3.58 | 1530.45 | 3906.91 | 155.28 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 15662.48 | 15849.25 | 1.19 | 778.08 | 1934.31 | 148.60 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 8121.14 | 7758.59 | -4.46 | 392.78 | 959.73 | 144.34 >> RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 17465.84 | 17069.34 | -2.27 | 16849.73 | 17842.08 | 5.89 >> RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 9049.19 | 8864.15 | -2.04 | 8786.67 | 9105.34 | 3.63 >> RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 17703.38 | 17070.98 | -3.57 | 16595.85 | 17784.68 | 7.16 >> RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 9007.68 | 8817.41 | -2.11 | 8704.49 | 9185.87 | 5.53 >> RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 17531.05 | 16983.40 | -3.12 | 16947.69 | 17655.40 | 4.18 >> RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 8986.30 | 8794.15 | -2.14 | 8816.62 | 9225.95 | 4.64 >> RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 21293.95 | 21506.74 | 1.00 | 21163.29 | 21854.03 | 3.26 >> RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 11258.47 | 11072.92 | -1.65 | 11118.12 | 11338.96 | 1.99 >> RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 21253.36 | 21292.37 | 0.18 | 21224.39 | 21763.88 | 2.54 >> RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 11064.80 | 11198.35 | 1.21 | 10960.98 | 11294.14 | 3.04 >> RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 21358.14 | 21346.21 | -0.06 | 21487.25 | 21854.42 | 1.71 >> RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 11045.61 | 11208.26 | 1.47 | 10907.03 | 11415.18 | 4.66 >> RotateBenchmark.testRotateRightB | 512.00 | 7.00 | 17898.61 | 18307.54 | 2.28 | 1214.65 | 2546.64 | 109.66 >> RotateBenchmark.testRotateRightB | 512.00 | 15.00 | 17909.25 | 18242.51 | 1.86 | 1215.05 | 2563.98 | 111.02 >> RotateBenchmark.testRotateRightB | 512.00 | 31.00 | 17883.35 | 17928.44 | 0.25 | 1220.77 | 2543.30 | 108.34 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 7139.97 | 7626.72 | 6.82 | 6994.86 | 7075.65 | 1.15 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 3657.37 | 3898.34 | 6.59 | 3617.06 | 3576.12 | -1.13 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 1804.26 | 1969.19 | 9.14 | 1796.62 | 1858.84 | 3.46 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 7404.31 | 7760.09 | 4.80 | 7036.77 | 7401.52 | 5.18 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 3600.52 | 3956.35 | 9.88 | 3595.28 | 3560.36 | -0.97 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 1813.32 | 1966.41 | 8.44 | 1839.95 | 1852.53 | 0.68 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 7118.48 | 7724.81 | 8.52 | 7151.56 | 7021.09 | -1.82 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 3529.70 | 3881.63 | 9.97 | 3623.08 | 3601.01 | -0.61 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 1823.61 | 1961.34 | 7.55 | 1786.86 | 1748.85 | -2.13 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 11697.98 | 11835.25 | 1.17 | 11513.16 | 11184.87 | -2.85 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 5890.11 | 6102.57 | 3.61 | 5658.79 | 5696.08 | 0.66 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 2964.94 | 3070.26 | 3.55 | 2945.00 | 2962.08 | 0.58 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 11562.51 | 12151.29 | 5.09 | 11404.17 | 11120.28 | -2.49 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 5702.93 | 6130.57 | 7.50 | 5799.54 | 5779.08 | -0.35 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 2861.96 | 3051.44 | 6.62 | 2943.99 | 2860.65 | -2.83 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 11203.13 | 11710.59 | 4.53 | 11363.18 | 11112.16 | -2.21 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 5893.97 | 6070.71 | 3.00 | 5776.67 | 5648.84 | -2.21 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 2971.83 | 3046.76 | 2.52 | 2903.35 | 2833.88 | -2.39 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 16064.71 | 15851.35 | -1.33 | 861.93 | 2256.88 | 161.84 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 7916.80 | 8462.65 | 6.89 | 430.23 | 1147.30 | 166.67 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 4104.64 | 4068.28 | -0.89 | 216.30 | 572.86 | 164.84 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 16133.09 | 16281.59 | 0.92 | 856.36 | 2229.58 | 160.35 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 8127.26 | 8117.59 | -0.12 | 419.16 | 1176.42 | 180.66 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 4080.11 | 4063.26 | -0.41 | 218.32 | 571.93 | 161.97 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 15834.26 | 16314.64 | 3.03 | 865.96 | 2297.74 | 165.34 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 7965.62 | 8270.48 | 3.83 | 428.55 | 1148.87 | 168.08 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 4161.69 | 4034.76 | -3.05 | 215.63 | 570.19 | 164.43 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 3556.70 | 3877.08 | 9.01 | 3596.46 | 3558.32 | -1.06 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 1772.93 | 1993.86 | 12.46 | 1856.79 | 1783.22 | -3.96 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 908.66 | 1000.37 | 10.09 | 944.79 | 922.91 | -2.32 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 3742.44 | 3748.41 | 0.16 | 3788.07 | 3570.67 | -5.74 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 1817.53 | 1985.69 | 9.25 | 1892.38 | 1833.16 | -3.13 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 941.03 | 952.68 | 1.24 | 915.79 | 910.21 | -0.61 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 3649.48 | 3896.56 | 6.77 | 3637.59 | 3557.53 | -2.20 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 1840.12 | 1997.19 | 8.54 | 1821.47 | 1799.82 | -1.19 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 901.33 | 995.67 | 10.47 | 909.20 | 902.73 | -0.71 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 5789.93 | 5960.54 | 2.95 | 5758.14 | 5736.30 | -0.38 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 2963.20 | 3063.30 | 3.38 | 2943.48 | 2833.84 | -3.72 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 1501.81 | 1510.23 | 0.56 | 1463.85 | 1462.26 | -0.11 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 5870.05 | 5951.43 | 1.39 | 5794.74 | 5604.58 | -3.28 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 2971.36 | 3047.00 | 2.55 | 2931.19 | 2907.30 | -0.82 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 1473.97 | 1530.54 | 3.84 | 1473.45 | 1442.40 | -2.11 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 5858.08 | 6080.49 | 3.80 | 5863.69 | 5549.85 | -5.35 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 2916.24 | 3045.77 | 4.44 | 2981.59 | 2815.07 | -5.58 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 1441.20 | 1531.56 | 6.27 | 1492.47 | 1473.25 | -1.29 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 8147.24 | 8310.05 | 2.00 | 469.45 | 1235.21 | 163.12 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 4142.95 | 4258.86 | 2.80 | 234.14 | 615.52 | 162.88 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 2095.48 | 2087.20 | -0.40 | 113.55 | 311.19 | 174.05 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 8222.94 | 8246.58 | 0.29 | 458.91 | 1244.32 | 171.15 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 4160.04 | 4226.46 | 1.60 | 227.78 | 625.38 | 174.56 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 2064.63 | 2162.44 | 4.74 | 113.27 | 314.15 | 177.36 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 8157.94 | 8466.90 | 3.79 | 450.26 | 1221.90 | 171.37 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 4039.74 | 4283.33 | 6.03 | 224.82 | 612.68 | 172.53 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 2066.88 | 2147.51 | 3.90 | 110.97 | 303.43 | 173.42 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 13548.39 | 13245.87 | -2.23 | 13490.93 | 13084.76 | -3.01 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 7020.16 | 6768.85 | -3.58 | 6991.39 | 7044.32 | 0.76 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 3550.50 | 3505.19 | -1.28 | 3507.12 | 3612.86 | 3.01 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 13743.43 | 13325.44 | -3.04 | 13696.15 | 13255.80 | -3.22 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 6856.02 | 6969.18 | 1.65 | 6886.29 | 6834.12 | -0.76 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 3569.53 | 3492.76 | -2.15 | 3539.02 | 3470.02 | -1.95 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 13704.18 | 13495.07 | -1.53 | 13649.14 | 13583.87 | -0.48 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 7011.77 | 6953.93 | -0.82 | 6978.28 | 6740.30 | -3.41 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 3591.62 | 3620.12 | 0.79 | 3502.04 | 3510.05 | 0.23 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 21950.71 | 22113.60 | 0.74 | 21484.27 | 21596.64 | 0.52 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 11616.88 | 11099.73 | -4.45 | 11188.29 | 10737.68 | -4.03 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 5872.72 | 5579.12 | -5.00 | 5784.05 | 5454.57 | -5.70 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 22017.83 | 20817.97 | -5.45 | 21934.65 | 21356.90 | -2.63 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 11414.27 | 11044.86 | -3.24 | 11454.35 | 11140.34 | -2.74 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 5786.64 | 5634.05 | -2.64 | 5724.93 | 5639.99 | -1.48 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 21754.77 | 21466.01 | -1.33 | 21140.67 | 21970.03 | 3.92 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | -2.72 | 11204.90 | 11213.48 | 0.08 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 | 5594.33 | 5544.25 | -0.90 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 30247.03 | 30179.41 | -0.22 | 1538.75 | 3975.82 | 158.38 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 15988.73 | 15621.42 | -2.30 | 776.04 | 1910.91 | 146.24 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 8115.84 | 8025.28 | -1.12 | 389.12 | 984.46 | 152.99 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 30110.91 | 30200.69 | 0.30 | 1532.49 | 3983.77 | 159.95 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 15957.90 | 15690.73 | -1.67 | 774.90 | 1931.00 | 149.19 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 8113.26 | 8037.93 | -0.93 | 391.90 | 965.53 | 146.37 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 29816.97 | 29891.54 | 0.25 | 1538.12 | 3881.93 | 152.38 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 15405.95 | 15619.17 | 1.38 | 762.49 | 1871.00 | 145.38 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 7919.80 | 7957.35 | 0.47 | 393.63 | 972.49 | 147.06 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266054: Review comments resolution. > Java code updates look good > > I believe you can now remove the four "_-Rotate__.template" files now that you leverage exiting templates? > > Also, i believe ancillary changes to `gen-template.sh` are no longer strictly required, now that we defer to method calls for ROL/ROR? Thanks Paul, redundant files (missed in last check-in) have been removed and benchmark results with latest code updated. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720 From jbhateja at openjdk.java.net Sat May 8 05:54:38 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 8 May 2021 05:54:38 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v5] In-Reply-To: References: Message-ID: > Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- > > vec1 = lanewise(VectorOperators.LSHL, n) > vec2 = lanewise(VectorOperators.LSHR, n) > res = lanewise(VectorOperations.OR, vec1 , vec2) > > This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. > > AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. > > Please find below the performance data for included JMH benchmark. > Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > Benchmark | (TESTSIZE) | Shift | Baseline AVX3 (ops/ms) | Withopt? AVX3 (ops/ms) | Gain % | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain % > -- | -- | -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? | ? | ? > RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 17223.35 | 17094.69 | -0.75 | 17008.32 | 17488.06 | 2.82 > RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 8944.98 | 8811.34 | -1.49 | 8878.17 | 9218.68 | 3.84 > RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 17195.75 | 17137.32 | -0.34 | 16789.01 | 17780.34 | 5.90 > RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 9052.67 | 8838.60 | -2.36 | 8814.62 | 9206.01 | 4.44 > RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 17100.19 | 16950.64 | -0.87 | 16827.73 | 17720.37 | 5.30 > RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 9079.95 | 8471.26 | -6.70 | 8888.44 | 9167.68 | 3.14 > RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 21231.33 | 21513.08 | 1.33 | 21824.51 | 21479.48 | -1.58 > RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 11103.62 | 11180.16 | 0.69 | 11173.67 | 11529.22 | 3.18 > RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 21119.14 | 21552.04 | 2.05 | 21693.05 | 21915.37 | 1.02 > RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 11048.68 | 11094.20 | 0.41 | 11049.90 | 11439.07 | 3.52 > RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 21506.31 | 21391.41 | -0.53 | 21263.18 | 21986.29 | 3.40 > RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 11056.12 | 11232.78 | 1.60 | 10941.59 | 11397.09 | 4.16 > RotateBenchmark.testRotateLeftB | 512.00 | 7.00 | 17976.56 | 18180.85 | 1.14 | 1212.26 | 2533.34 | 108.98 > RotateBenchmark.testRotateLeftB | 512.00 | 15.00 | 17553.70 | 18219.07 | 3.79 | 1256.73 | 2537.41 | 101.91 > RotateBenchmark.testRotateLeftB | 512.00 | 31.00 | 17618.03 | 17738.15 | 0.68 | 1214.69 | 2533.83 | 108.60 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 7258.87 | 7468.88 | 2.89 | 7115.12 | 7117.26 | 0.03 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 3586.65 | 3950.85 | 10.15 | 3532.17 | 3595.80 | 1.80 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 1835.07 | 1999.68 | 8.97 | 1789.90 | 1819.93 | 1.68 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 7273.36 | 7410.91 | 1.89 | 7198.60 | 6994.79 | -2.83 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 3674.98 | 3926.27 | 6.84 | 3549.90 | 3755.09 | 5.78 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 1840.94 | 1882.25 | 2.24 | 1801.56 | 1872.89 | 3.96 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 7457.11 | 7361.48 | -1.28 | 6975.33 | 7385.94 | 5.89 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 3570.74 | 3929.30 | 10.04 | 3635.37 | 3736.67 | 2.79 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 1902.32 | 1960.46 | 3.06 | 1812.32 | 1813.88 | 0.09 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 11174.24 | 12044.52 | 7.79 | 11509.87 | 11273.44 | -2.05 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 5981.47 | 6073.70 | 1.54 | 5593.66 | 5661.93 | 1.22 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 2932.49 | 3069.54 | 4.67 | 2950.86 | 2892.42 | -1.98 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 11764.11 | 12098.63 | 2.84 | 11069.52 | 11476.93 | 3.68 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 5855.20 | 6080.40 | 3.85 | 5919.11 | 5607.04 | -5.27 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 2989.05 | 3048.56 | 1.99 | 2902.63 | 2821.83 | -2.78 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 11652.84 | 11965.40 | 2.68 | 11525.62 | 11459.83 | -0.57 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 5851.82 | 6164.94 | 5.35 | 5882.60 | 5842.30 | -0.69 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 3015.99 | 3043.79 | 0.92 | 2963.71 | 2947.97 | -0.53 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 16029.15 | 16189.79 | 1.00 | 860.43 | 2339.32 | 171.88 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 8078.25 | 8081.84 | 0.04 | 427.39 | 1147.92 | 168.59 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 4021.49 | 4294.03 | 6.78 | 209.25 | 582.28 | 178.27 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 15912.98 | 16329.03 | 2.61 | 848.23 | 2296.78 | 170.77 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 8054.10 | 8306.37 | 3.13 | 429.93 | 1146.90 | 166.77 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 4102.58 | 4071.08 | -0.77 | 217.86 | 582.20 | 167.24 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 16177.79 | 16287.85 | 0.68 | 857.84 | 2243.15 | 161.49 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 8187.47 | 8410.48 | 2.72 | 434.60 | 1128.20 | 159.60 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 4109.15 | 4233.80 | 3.03 | 208.71 | 572.43 | 174.27 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 3755.09 | 3930.29 | 4.67 | 3604.19 | 3598.47 | -0.16 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 1829.03 | 1957.39 | 7.02 | 1833.95 | 1808.38 | -1.39 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 915.35 | 970.55 | 6.03 | 916.25 | 899.08 | -1.87 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 3664.85 | 3812.26 | 4.02 | 3629.37 | 3579.23 | -1.38 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 1829.51 | 1877.76 | 2.64 | 1781.05 | 1807.57 | 1.49 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 913.37 | 953.42 | 4.38 | 912.26 | 908.73 | -0.39 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 3648.45 | 3899.20 | 6.87 | 3552.67 | 3581.04 | 0.80 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 1816.50 | 1959.68 | 7.88 | 1820.88 | 1819.71 | -0.06 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 901.05 | 955.13 | 6.00 | 913.74 | 907.90 | -0.64 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 5850.99 | 6108.64 | 4.40 | 5882.65 | 5755.21 | -2.17 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 2962.21 | 3060.47 | 3.32 | 2955.20 | 2909.18 | -1.56 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 1480.46 | 1534.72 | 3.66 | 1467.78 | 1430.60 | -2.53 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 5858.23 | 6047.51 | 3.23 | 5770.02 | 5773.19 | 0.05 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 2951.49 | 3096.53 | 4.91 | 2885.21 | 2899.31 | 0.49 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 1486.26 | 1527.94 | 2.80 | 1441.93 | 1454.25 | 0.85 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 5873.21 | 6089.75 | 3.69 | 5767.58 | 5664.11 | -1.79 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 2969.67 | 3081.39 | 3.76 | 2878.50 | 2905.86 | 0.95 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 1452.21 | 1520.03 | 4.67 | 1430.30 | 1485.63 | 3.87 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 8088.65 | 8443.63 | 4.39 | 455.67 | 1226.33 | 169.13 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 4011.95 | 4120.25 | 2.70 | 229.77 | 619.87 | 169.77 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 2090.57 | 2109.53 | 0.91 | 115.21 | 310.36 | 169.37 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 8166.84 | 8557.28 | 4.78 | 457.67 | 1242.86 | 171.56 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 4137.02 | 4287.95 | 3.65 | 227.26 | 624.80 | 174.93 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 2095.01 | 2102.86 | 0.37 | 114.26 | 310.83 | 172.03 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 8082.68 | 8400.56 | 3.93 | 459.59 | 1230.07 | 167.64 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 4047.67 | 4147.58 | 2.47 | 229.01 | 606.38 | 164.78 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 2086.83 | 2126.72 | 1.91 | 111.93 | 305.66 | 173.08 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 13597.19 | 13255.09 | -2.52 | 13818.39 | 13242.40 | -4.17 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 7028.26 | 6826.59 | -2.87 | 6765.15 | 6907.87 | 2.11 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 3570.40 | 3468.01 | -2.87 | 3449.66 | 3533.50 | 2.43 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 13615.99 | 13464.40 | -1.11 | 13330.02 | 13870.57 | 4.06 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 7043.31 | 6763.34 | -3.97 | 6928.88 | 7063.57 | 1.94 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 3495.12 | 3537.62 | 1.22 | 3503.41 | 3457.67 | -1.31 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 13591.66 | 13665.84 | 0.55 | 13773.27 | 13126.08 | -4.70 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 7027.08 | 7011.24 | -0.23 | 6974.98 | 6815.50 | -2.29 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 3568.28 | 3569.62 | 0.04 | 3580.67 | 3463.58 | -3.27 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 21154.03 | 21416.32 | 1.24 | 21187.01 | 21401.61 | 1.01 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 11194.24 | 10865.47 | -2.94 | 11063.19 | 10977.60 | -0.77 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 5797.80 | 5523.94 | -4.72 | 5654.63 | 5468.78 | -3.29 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 21333.89 | 21412.74 | 0.37 | 21610.94 | 20908.96 | -3.25 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 11327.07 | 11113.48 | -1.89 | 11148.25 | 10678.14 | -4.22 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 5810.69 | 5569.72 | -4.15 | 5663.26 | 5618.87 | -0.78 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 21753.20 | 21198.43 | -2.55 | 21567.90 | 21929.81 | 1.68 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 11517.08 | 11039.64 | -4.15 | 11103.08 | 10871.59 | -2.08 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 5897.16 | 5606.75 | -4.92 | 5459.87 | 5604.12 | 2.64 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 29748.53 | 28883.73 | -2.91 | 1549.02 | 3928.53 | 153.61 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 15197.09 | 15878.19 | 4.48 | 772.59 | 1924.35 | 149.08 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 8046.30 | 8081.19 | 0.43 | 388.11 | 990.28 | 155.16 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 30618.04 | 29419.19 | -3.92 | 1524.22 | 3915.97 | 156.92 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 15854.43 | 15846.37 | -0.05 | 766.09 | 1953.60 | 155.01 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 7814.77 | 7899.30 | 1.08 | 390.82 | 970.37 | 148.29 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 29596.82 | 28538.69 | -3.58 | 1530.45 | 3906.91 | 155.28 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 15662.48 | 15849.25 | 1.19 | 778.08 | 1934.31 | 148.60 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 8121.14 | 7758.59 | -4.46 | 392.78 | 959.73 | 144.34 > RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 17465.84 | 17069.34 | -2.27 | 16849.73 | 17842.08 | 5.89 > RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 9049.19 | 8864.15 | -2.04 | 8786.67 | 9105.34 | 3.63 > RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 17703.38 | 17070.98 | -3.57 | 16595.85 | 17784.68 | 7.16 > RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 9007.68 | 8817.41 | -2.11 | 8704.49 | 9185.87 | 5.53 > RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 17531.05 | 16983.40 | -3.12 | 16947.69 | 17655.40 | 4.18 > RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 8986.30 | 8794.15 | -2.14 | 8816.62 | 9225.95 | 4.64 > RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 21293.95 | 21506.74 | 1.00 | 21163.29 | 21854.03 | 3.26 > RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 11258.47 | 11072.92 | -1.65 | 11118.12 | 11338.96 | 1.99 > RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 21253.36 | 21292.37 | 0.18 | 21224.39 | 21763.88 | 2.54 > RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 11064.80 | 11198.35 | 1.21 | 10960.98 | 11294.14 | 3.04 > RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 21358.14 | 21346.21 | -0.06 | 21487.25 | 21854.42 | 1.71 > RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 11045.61 | 11208.26 | 1.47 | 10907.03 | 11415.18 | 4.66 > RotateBenchmark.testRotateRightB | 512.00 | 7.00 | 17898.61 | 18307.54 | 2.28 | 1214.65 | 2546.64 | 109.66 > RotateBenchmark.testRotateRightB | 512.00 | 15.00 | 17909.25 | 18242.51 | 1.86 | 1215.05 | 2563.98 | 111.02 > RotateBenchmark.testRotateRightB | 512.00 | 31.00 | 17883.35 | 17928.44 | 0.25 | 1220.77 | 2543.30 | 108.34 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 7139.97 | 7626.72 | 6.82 | 6994.86 | 7075.65 | 1.15 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 3657.37 | 3898.34 | 6.59 | 3617.06 | 3576.12 | -1.13 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 1804.26 | 1969.19 | 9.14 | 1796.62 | 1858.84 | 3.46 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 7404.31 | 7760.09 | 4.80 | 7036.77 | 7401.52 | 5.18 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 3600.52 | 3956.35 | 9.88 | 3595.28 | 3560.36 | -0.97 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 1813.32 | 1966.41 | 8.44 | 1839.95 | 1852.53 | 0.68 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 7118.48 | 7724.81 | 8.52 | 7151.56 | 7021.09 | -1.82 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 3529.70 | 3881.63 | 9.97 | 3623.08 | 3601.01 | -0.61 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 1823.61 | 1961.34 | 7.55 | 1786.86 | 1748.85 | -2.13 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 11697.98 | 11835.25 | 1.17 | 11513.16 | 11184.87 | -2.85 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 5890.11 | 6102.57 | 3.61 | 5658.79 | 5696.08 | 0.66 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 2964.94 | 3070.26 | 3.55 | 2945.00 | 2962.08 | 0.58 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 11562.51 | 12151.29 | 5.09 | 11404.17 | 11120.28 | -2.49 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 5702.93 | 6130.57 | 7.50 | 5799.54 | 5779.08 | -0.35 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 2861.96 | 3051.44 | 6.62 | 2943.99 | 2860.65 | -2.83 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 11203.13 | 11710.59 | 4.53 | 11363.18 | 11112.16 | -2.21 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 5893.97 | 6070.71 | 3.00 | 5776.67 | 5648.84 | -2.21 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 2971.83 | 3046.76 | 2.52 | 2903.35 | 2833.88 | -2.39 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 16064.71 | 15851.35 | -1.33 | 861.93 | 2256.88 | 161.84 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 7916.80 | 8462.65 | 6.89 | 430.23 | 1147.30 | 166.67 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 4104.64 | 4068.28 | -0.89 | 216.30 | 572.86 | 164.84 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 16133.09 | 16281.59 | 0.92 | 856.36 | 2229.58 | 160.35 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 8127.26 | 8117.59 | -0.12 | 419.16 | 1176.42 | 180.66 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 4080.11 | 4063.26 | -0.41 | 218.32 | 571.93 | 161.97 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 15834.26 | 16314.64 | 3.03 | 865.96 | 2297.74 | 165.34 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 7965.62 | 8270.48 | 3.83 | 428.55 | 1148.87 | 168.08 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 4161.69 | 4034.76 | -3.05 | 215.63 | 570.19 | 164.43 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 3556.70 | 3877.08 | 9.01 | 3596.46 | 3558.32 | -1.06 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 1772.93 | 1993.86 | 12.46 | 1856.79 | 1783.22 | -3.96 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 908.66 | 1000.37 | 10.09 | 944.79 | 922.91 | -2.32 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 3742.44 | 3748.41 | 0.16 | 3788.07 | 3570.67 | -5.74 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 1817.53 | 1985.69 | 9.25 | 1892.38 | 1833.16 | -3.13 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 941.03 | 952.68 | 1.24 | 915.79 | 910.21 | -0.61 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 3649.48 | 3896.56 | 6.77 | 3637.59 | 3557.53 | -2.20 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 1840.12 | 1997.19 | 8.54 | 1821.47 | 1799.82 | -1.19 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 901.33 | 995.67 | 10.47 | 909.20 | 902.73 | -0.71 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 5789.93 | 5960.54 | 2.95 | 5758.14 | 5736.30 | -0.38 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 2963.20 | 3063.30 | 3.38 | 2943.48 | 2833.84 | -3.72 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 1501.81 | 1510.23 | 0.56 | 1463.85 | 1462.26 | -0.11 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 5870.05 | 5951.43 | 1.39 | 5794.74 | 5604.58 | -3.28 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 2971.36 | 3047.00 | 2.55 | 2931.19 | 2907.30 | -0.82 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 1473.97 | 1530.54 | 3.84 | 1473.45 | 1442.40 | -2.11 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 5858.08 | 6080.49 | 3.80 | 5863.69 | 5549.85 | -5.35 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 2916.24 | 3045.77 | 4.44 | 2981.59 | 2815.07 | -5.58 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 1441.20 | 1531.56 | 6.27 | 1492.47 | 1473.25 | -1.29 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 8147.24 | 8310.05 | 2.00 | 469.45 | 1235.21 | 163.12 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 4142.95 | 4258.86 | 2.80 | 234.14 | 615.52 | 162.88 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 2095.48 | 2087.20 | -0.40 | 113.55 | 311.19 | 174.05 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 8222.94 | 8246.58 | 0.29 | 458.91 | 1244.32 | 171.15 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 4160.04 | 4226.46 | 1.60 | 227.78 | 625.38 | 174.56 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 2064.63 | 2162.44 | 4.74 | 113.27 | 314.15 | 177.36 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 8157.94 | 8466.90 | 3.79 | 450.26 | 1221.90 | 171.37 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 4039.74 | 4283.33 | 6.03 | 224.82 | 612.68 | 172.53 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 2066.88 | 2147.51 | 3.90 | 110.97 | 303.43 | 173.42 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 13548.39 | 13245.87 | -2.23 | 13490.93 | 13084.76 | -3.01 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 7020.16 | 6768.85 | -3.58 | 6991.39 | 7044.32 | 0.76 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 3550.50 | 3505.19 | -1.28 | 3507.12 | 3612.86 | 3.01 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 13743.43 | 13325.44 | -3.04 | 13696.15 | 13255.80 | -3.22 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 6856.02 | 6969.18 | 1.65 | 6886.29 | 6834.12 | -0.76 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 3569.53 | 3492.76 | -2.15 | 3539.02 | 3470.02 | -1.95 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 13704.18 | 13495.07 | -1.53 | 13649.14 | 13583.87 | -0.48 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 7011.77 | 6953.93 | -0.82 | 6978.28 | 6740.30 | -3.41 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 3591.62 | 3620.12 | 0.79 | 3502.04 | 3510.05 | 0.23 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 21950.71 | 22113.60 | 0.74 | 21484.27 | 21596.64 | 0.52 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 11616.88 | 11099.73 | -4.45 | 11188.29 | 10737.68 | -4.03 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 5872.72 | 5579.12 | -5.00 | 5784.05 | 5454.57 | -5.70 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 22017.83 | 20817.97 | -5.45 | 21934.65 | 21356.90 | -2.63 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 11414.27 | 11044.86 | -3.24 | 11454.35 | 11140.34 | -2.74 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 5786.64 | 5634.05 | -2.64 | 5724.93 | 5639.99 | -1.48 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 21754.77 | 21466.01 | -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 | 5594.33 | 5544.25 | -0.90 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 30247.03 | 30179.41 | -0.22 | 1538.75 | 3975.82 | 158.38 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 15988.73 | 15621.42 | -2.30 | 776.04 | 1910.91 | 146.24 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 8115.84 | 8025.28 | -1.12 | 389.12 | 984.46 | 152.99 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 30110.91 | 30200.69 | 0.30 | 1532.49 | 3983.77 | 159.95 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 15957.90 | 15690.73 | -1.67 | 774.90 | 1931.00 | 149.19 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 8113.26 | 8037.93 | -0.93 | 391.90 | 965.53 | 146.37 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 29816.97 | 29891.54 | 0.25 | 1538.12 | 3881.93 | 152.38 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 15405.95 | 15619.17 | 1.38 | 762.49 | 1871.00 | 145.38 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 7919.80 | 7957.35 | 0.47 | 393.63 | 972.49 | 147.06 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266054: Removing redundant teat templates. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3720/files - new: https://git.openjdk.java.net/jdk/pull/3720/files/8042aa23..ef46c0a8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=03-04 Stats: 37 lines in 4 files changed: 0 ins; 37 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3720.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3720/head:pull/3720 PR: https://git.openjdk.java.net/jdk/pull/3720 From whuang at openjdk.java.net Sat May 8 06:46:11 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 06:46:11 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 04:47:42 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/x86/x86.ad line 5864: >> >>> 5862: match(Set dst (LShiftVB src (LShiftCntV shift))); >>> 5863: match(Set dst (RShiftVB src (RShiftCntV shift))); >>> 5864: match(Set dst (URShiftVB src (RShiftCntV shift))); >> >> Regarding to this issue, it's no need to add these rules here. There is the vector shift rules already. > > The same to all other rules added in this file. > Please update the copyright year of `"vectorIntrinsics.cpp"` to 2021. Thanks! Thank you. I will change that. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Sat May 8 06:46:12 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 06:46:12 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 06:43:01 GMT, Wang Huang wrote: >> The same to all other rules added in this file. > >> Please update the copyright year of `"vectorIntrinsics.cpp"` to 2021. Thanks! > > Thank you. I will change that. > The same to all other rules added in this file. OK. I will remove those codes. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Sat May 8 06:46:13 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 06:46:13 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: <9soSU7reOUlNfbcIUxG7VqFYnO7I6krX49g2ldkRye4=.23fba61d-29b9-4a9c-bebd-ea8ebcc588ae@github.com> On Sat, 8 May 2021 03:52:58 GMT, Xiaohong Gong wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bugs > > src/hotspot/share/opto/vectorIntrinsics.cpp line 372: > >> 370: } else if (step_val->get_con() > 1) { >> 371: Node* cnt = gvn().makecon(TypeInt::make(log2i_exact(step_val->get_con()))); >> 372: Node* shift_cnt = gvn().transform(new LShiftCntVNode(cnt, vt)); > > Use "vector_shift_count"? It will mask the shift count before generating the `LShiftCntVNode`. `vector_shift_count` which calls `VectorNode::shift_count` does not contain `Op_LShiftB`: ```c++ VectorNode* VectorNode::shift_count(int opc, Node* cnt, uint vlen, BasicType bt) { // Match shift count type with shift vector type. const TypeVect* vt = TypeVect::make(bt, vlen); switch (opc) { case Op_LShiftI: case Op_LShiftL: return new LShiftCntVNode(cnt, vt); case Op_RShiftI: case Op_RShiftL: case Op_URShiftB: case Op_URShiftS: case Op_URShiftI: case Op_URShiftL: return new RShiftCntVNode(cnt, vt); default: fatal("Missed vector creation for '%s'", NodeClassNames[opc]); return NULL; } } ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From ddong at openjdk.java.net Sat May 8 06:52:13 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sat, 8 May 2021 06:52:13 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v4] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: swap the positions of two operands in cmp operation since the det register will be modified in 32 bit ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/93ae3346..73da1108 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From xgong at openjdk.java.net Sat May 8 07:06:54 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Sat, 8 May 2021 07:06:54 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: <9soSU7reOUlNfbcIUxG7VqFYnO7I6krX49g2ldkRye4=.23fba61d-29b9-4a9c-bebd-ea8ebcc588ae@github.com> References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> <9soSU7reOUlNfbcIUxG7VqFYnO7I6krX49g2ldkRye4=.23fba61d-29b9-4a9c-bebd-ea8ebcc588ae@github.com> Message-ID: <_5yRdgU-wrsv1b5rKZDxCCglN7Rf7qlv3awgprvq5FY=.d31066a1-a800-41d1-b4a3-606f9c99467e@github.com> On Sat, 8 May 2021 06:40:31 GMT, Wang Huang wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 372: >> >>> 370: } else if (step_val->get_con() > 1) { >>> 371: Node* cnt = gvn().makecon(TypeInt::make(log2i_exact(step_val->get_con()))); >>> 372: Node* shift_cnt = gvn().transform(new LShiftCntVNode(cnt, vt)); >> >> Use "vector_shift_count"? It will mask the shift count before generating the `LShiftCntVNode`. > > `vector_shift_count` which calls `VectorNode::shift_count` does not contain `Op_LShiftB`: > > ```c++ > VectorNode* VectorNode::shift_count(int opc, Node* cnt, uint vlen, BasicType bt) { > // Match shift count type with shift vector type. > const TypeVect* vt = TypeVect::make(bt, vlen); > switch (opc) { > case Op_LShiftI: > case Op_LShiftL: > return new LShiftCntVNode(cnt, vt); > case Op_RShiftI: > case Op_RShiftL: > case Op_URShiftB: > case Op_URShiftS: > case Op_URShiftI: > case Op_URShiftL: > return new RShiftCntVNode(cnt, vt); > default: > fatal("Missed vector creation for '%s'", NodeClassNames[opc]); > return NULL; > } > } Byte and short will use `"Op_LShiftI/Op_RShiftI"`. Please see: https://github.com/openjdk/panama-vector/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L335 ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From egahlin at openjdk.java.net Sat May 8 07:11:18 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Sat, 8 May 2021 07:11:18 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v3] In-Reply-To: References: Message-ID: On Sat, 1 May 2021 09:45:11 GMT, Denghui Dong wrote: > @egahlin Hi, could you help review this patch? I think this patch is best reviewed by the compiler team. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From whuang at openjdk.java.net Sat May 8 07:53:13 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 07:53:13 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: <_5yRdgU-wrsv1b5rKZDxCCglN7Rf7qlv3awgprvq5FY=.d31066a1-a800-41d1-b4a3-606f9c99467e@github.com> References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> <9soSU7reOUlNfbcIUxG7VqFYnO7I6krX49g2ldkRye4=.23fba61d-29b9-4a9c-bebd-ea8ebcc588ae@github.com> <_5yRdgU-wrsv1b5rKZDxCCglN7Rf7qlv3awgprvq5FY=.d31066a1-a800-41d1-b4a3-606f9c99467e@github.com> Message-ID: <8baQ_mF0_OKKpOugyTNBKnlvv7gHCVl5yOddnFJyhYc=.2f789c57-6181-41a4-8c86-6e52e30c5aae@github.com> On Sat, 8 May 2021 07:03:55 GMT, Xiaohong Gong wrote: >> `vector_shift_count` which calls `VectorNode::shift_count` does not contain `Op_LShiftB`: >> >> ```c++ >> VectorNode* VectorNode::shift_count(int opc, Node* cnt, uint vlen, BasicType bt) { >> // Match shift count type with shift vector type. >> const TypeVect* vt = TypeVect::make(bt, vlen); >> switch (opc) { >> case Op_LShiftI: >> case Op_LShiftL: >> return new LShiftCntVNode(cnt, vt); >> case Op_RShiftI: >> case Op_RShiftL: >> case Op_URShiftB: >> case Op_URShiftS: >> case Op_URShiftI: >> case Op_URShiftL: >> return new RShiftCntVNode(cnt, vt); >> default: >> fatal("Missed vector creation for '%s'", NodeClassNames[opc]); >> return NULL; >> } >> } > > Byte and short will use `"Op_LShiftI/Op_RShiftI"`. Please see: https://github.com/openjdk/panama-vector/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L335 Thank you for your comment. I think it?s a bit strange here to use `LShfitI` instead of `LShiftB`. Anyway, reusing is better than new things. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Sat May 8 08:16:56 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 08:16:56 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v4] In-Reply-To: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: > It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : > * show the crash case `TestVectorShuffleIotaShort` > * solve the issue on `aarch64` and `x86` by adding the rule. > * test after fixing on tire1~3 > > Thank you for your review. Any suggestion is welcome. > Wang Huang Wang Huang has updated the pull request incrementally with two additional commits since the last revision: - update copyright years - remove x86's adfile ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3747/files - new: https://git.openjdk.java.net/jdk/pull/3747/files/d49029be..c1305b4d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=02-03 Stats: 143 lines in 4 files changed: 0 ins; 141 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3747.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3747/head:pull/3747 PR: https://git.openjdk.java.net/jdk/pull/3747 From xgong at openjdk.java.net Sat May 8 08:16:57 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Sat, 8 May 2021 08:16:57 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v4] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 08:12:49 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * solve the issue on `aarch64` and `x86` by adding the rule. >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with two additional commits since the last revision: > > - update copyright years > - remove x86's adfile src/hotspot/share/opto/vectorIntrinsics.cpp line 2: > 1: /* > 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. Should be "2020, 2021" since it is not a new file. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Sat May 8 08:16:58 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 08:16:58 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v4] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 08:09:35 GMT, Xiaohong Gong wrote: >> Wang Huang has updated the pull request incrementally with two additional commits since the last revision: >> >> - update copyright years >> - remove x86's adfile > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. > > Should be "2020, 2021" since it is not a new file. Thank you. I have changed that. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From xgong at openjdk.java.net Sat May 8 08:33:57 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Sat, 8 May 2021 08:33:57 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Fri, 7 May 2021 09:03:14 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * ~~solve the issue on `aarch64` and `x86` by adding the rule. ~~ sove the issue by adding `LShiftCntVNode` without adding any rule >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix bugs test/hotspot/jtreg/compiler/vectorapi/TestVectorShuffleIotaByte.java line 63: > 61: static byte[] expected_256 = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, > 62: -31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11, -9, -7, -5, -3, -1}; > 63: static byte[] expected_512 = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, Looks much better to me! Just a small suggestion here: it's better to use `static final` if these arrays are not expected to be modified. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Sat May 8 08:33:58 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 08:33:58 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 08:28:26 GMT, Xiaohong Gong wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bugs > > test/hotspot/jtreg/compiler/vectorapi/TestVectorShuffleIotaByte.java line 63: > >> 61: static byte[] expected_256 = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, >> 62: -31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11, -9, -7, -5, -3, -1}; >> 63: static byte[] expected_512 = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, > > Looks much better to me! Just a small suggestion here: it's better to use `static final` if these arrays are not expected to be modified. OK. Thank you for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Sat May 8 08:41:32 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 08:41:32 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v5] In-Reply-To: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: > It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : > * show the crash case `TestVectorShuffleIotaShort` > * ~~solve the issue on `aarch64` and `x86` by adding the rule. ~~ sove the issue by adding `LShiftCntVNode` without adding any rule > * test after fixing on tire1~3 > > Thank you for your review. Any suggestion is welcome. > Wang Huang Wang Huang has updated the pull request incrementally with one additional commit since the last revision: change to static final ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3747/files - new: https://git.openjdk.java.net/jdk/pull/3747/files/c1305b4d..bda9c29f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=03-04 Stats: 16 lines in 1 file changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/3747.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3747/head:pull/3747 PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Sat May 8 08:41:34 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 8 May 2021 08:41:34 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v3] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 08:30:41 GMT, Wang Huang wrote: >> test/hotspot/jtreg/compiler/vectorapi/TestVectorShuffleIotaByte.java line 63: >> >>> 61: static byte[] expected_256 = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, >>> 62: -31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11, -9, -7, -5, -3, -1}; >>> 63: static byte[] expected_512 = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, >> >> Looks much better to me! Just a small suggestion here: it's better to use `static final` if these arrays are not expected to be modified. > > OK. Thank you for your review. I have changed this. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From xgong at openjdk.java.net Sat May 8 09:27:57 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Sat, 8 May 2021 09:27:57 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v5] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 08:41:32 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * ~~solve the issue on `aarch64` and `x86` by adding the rule. ~~ sove the issue by adding `LShiftCntVNode` without adding any rule >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > change to static final LGTM, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From yyang at openjdk.java.net Sat May 8 09:45:19 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Sat, 8 May 2021 09:45:19 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v2] In-Reply-To: References: Message-ID: > After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. > > There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() > > https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 > > We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. > > (This patch actually does two things: > 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw > 2. `Cleanup unused Unsafe{Get,Put}Raw code` > They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) > > Thanks! > Yang Yi Yang has updated the pull request incrementally with one additional commit since the last revision: unaliged_move for ppc/s390 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3917/files - new: https://git.openjdk.java.net/jdk/pull/3917/files/03d339d7..8c239e45 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3917&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3917&range=00-01 Stats: 8 lines in 1 file changed: 4 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/3917.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3917/head:pull/3917 PR: https://git.openjdk.java.net/jdk/pull/3917 From yyang at openjdk.java.net Sat May 8 11:02:40 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Sat, 8 May 2021 11:02:40 GMT Subject: RFR: 8266189: Remove C1 "IfInstanceOf" instruction Message-ID: Remove IfInstanceOf instruction, it has been there for a long while(13yrs) and not implemented yet. ------------- Commit messages: - remove c1 IfInstanceOf Changes: https://git.openjdk.java.net/jdk/pull/3935/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3935&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266189 Stats: 93 lines in 10 files changed: 0 ins; 93 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3935.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3935/head:pull/3935 PR: https://git.openjdk.java.net/jdk/pull/3935 From psandoz at openjdk.java.net Sat May 8 15:43:56 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Sat, 8 May 2021 15:43:56 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v5] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 05:54:38 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(VectorOperators.LSHR, n) >> res = lanewise(VectorOperations.OR, vec1 , vec2) >> >> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. >> >> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. >> >> Please find below the performance data for included JMH benchmark. >> Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> Benchmark | (TESTSIZE) | Shift | Baseline AVX3 (ops/ms) | Withopt? AVX3 (ops/ms) | Gain % | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain % >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> ? | ? | ? | ? | ? | ? | ? | ? | ? >> RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 17223.35 | 17094.69 | -0.75 | 17008.32 | 17488.06 | 2.82 >> RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 8944.98 | 8811.34 | -1.49 | 8878.17 | 9218.68 | 3.84 >> RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 17195.75 | 17137.32 | -0.34 | 16789.01 | 17780.34 | 5.90 >> RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 9052.67 | 8838.60 | -2.36 | 8814.62 | 9206.01 | 4.44 >> RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 17100.19 | 16950.64 | -0.87 | 16827.73 | 17720.37 | 5.30 >> RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 9079.95 | 8471.26 | -6.70 | 8888.44 | 9167.68 | 3.14 >> RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 21231.33 | 21513.08 | 1.33 | 21824.51 | 21479.48 | -1.58 >> RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 11103.62 | 11180.16 | 0.69 | 11173.67 | 11529.22 | 3.18 >> RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 21119.14 | 21552.04 | 2.05 | 21693.05 | 21915.37 | 1.02 >> RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 11048.68 | 11094.20 | 0.41 | 11049.90 | 11439.07 | 3.52 >> RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 21506.31 | 21391.41 | -0.53 | 21263.18 | 21986.29 | 3.40 >> RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 11056.12 | 11232.78 | 1.60 | 10941.59 | 11397.09 | 4.16 >> RotateBenchmark.testRotateLeftB | 512.00 | 7.00 | 17976.56 | 18180.85 | 1.14 | 1212.26 | 2533.34 | 108.98 >> RotateBenchmark.testRotateLeftB | 512.00 | 15.00 | 17553.70 | 18219.07 | 3.79 | 1256.73 | 2537.41 | 101.91 >> RotateBenchmark.testRotateLeftB | 512.00 | 31.00 | 17618.03 | 17738.15 | 0.68 | 1214.69 | 2533.83 | 108.60 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 7258.87 | 7468.88 | 2.89 | 7115.12 | 7117.26 | 0.03 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 3586.65 | 3950.85 | 10.15 | 3532.17 | 3595.80 | 1.80 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 1835.07 | 1999.68 | 8.97 | 1789.90 | 1819.93 | 1.68 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 7273.36 | 7410.91 | 1.89 | 7198.60 | 6994.79 | -2.83 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 3674.98 | 3926.27 | 6.84 | 3549.90 | 3755.09 | 5.78 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 1840.94 | 1882.25 | 2.24 | 1801.56 | 1872.89 | 3.96 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 7457.11 | 7361.48 | -1.28 | 6975.33 | 7385.94 | 5.89 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 3570.74 | 3929.30 | 10.04 | 3635.37 | 3736.67 | 2.79 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 1902.32 | 1960.46 | 3.06 | 1812.32 | 1813.88 | 0.09 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 11174.24 | 12044.52 | 7.79 | 11509.87 | 11273.44 | -2.05 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 5981.47 | 6073.70 | 1.54 | 5593.66 | 5661.93 | 1.22 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 2932.49 | 3069.54 | 4.67 | 2950.86 | 2892.42 | -1.98 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 11764.11 | 12098.63 | 2.84 | 11069.52 | 11476.93 | 3.68 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 5855.20 | 6080.40 | 3.85 | 5919.11 | 5607.04 | -5.27 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 2989.05 | 3048.56 | 1.99 | 2902.63 | 2821.83 | -2.78 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 11652.84 | 11965.40 | 2.68 | 11525.62 | 11459.83 | -0.57 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 5851.82 | 6164.94 | 5.35 | 5882.60 | 5842.30 | -0.69 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 3015.99 | 3043.79 | 0.92 | 2963.71 | 2947.97 | -0.53 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 16029.15 | 16189.79 | 1.00 | 860.43 | 2339.32 | 171.88 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 8078.25 | 8081.84 | 0.04 | 427.39 | 1147.92 | 168.59 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 4021.49 | 4294.03 | 6.78 | 209.25 | 582.28 | 178.27 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 15912.98 | 16329.03 | 2.61 | 848.23 | 2296.78 | 170.77 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 8054.10 | 8306.37 | 3.13 | 429.93 | 1146.90 | 166.77 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 4102.58 | 4071.08 | -0.77 | 217.86 | 582.20 | 167.24 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 16177.79 | 16287.85 | 0.68 | 857.84 | 2243.15 | 161.49 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 8187.47 | 8410.48 | 2.72 | 434.60 | 1128.20 | 159.60 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 4109.15 | 4233.80 | 3.03 | 208.71 | 572.43 | 174.27 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 3755.09 | 3930.29 | 4.67 | 3604.19 | 3598.47 | -0.16 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 1829.03 | 1957.39 | 7.02 | 1833.95 | 1808.38 | -1.39 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 915.35 | 970.55 | 6.03 | 916.25 | 899.08 | -1.87 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 3664.85 | 3812.26 | 4.02 | 3629.37 | 3579.23 | -1.38 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 1829.51 | 1877.76 | 2.64 | 1781.05 | 1807.57 | 1.49 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 913.37 | 953.42 | 4.38 | 912.26 | 908.73 | -0.39 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 3648.45 | 3899.20 | 6.87 | 3552.67 | 3581.04 | 0.80 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 1816.50 | 1959.68 | 7.88 | 1820.88 | 1819.71 | -0.06 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 901.05 | 955.13 | 6.00 | 913.74 | 907.90 | -0.64 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 5850.99 | 6108.64 | 4.40 | 5882.65 | 5755.21 | -2.17 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 2962.21 | 3060.47 | 3.32 | 2955.20 | 2909.18 | -1.56 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 1480.46 | 1534.72 | 3.66 | 1467.78 | 1430.60 | -2.53 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 5858.23 | 6047.51 | 3.23 | 5770.02 | 5773.19 | 0.05 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 2951.49 | 3096.53 | 4.91 | 2885.21 | 2899.31 | 0.49 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 1486.26 | 1527.94 | 2.80 | 1441.93 | 1454.25 | 0.85 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 5873.21 | 6089.75 | 3.69 | 5767.58 | 5664.11 | -1.79 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 2969.67 | 3081.39 | 3.76 | 2878.50 | 2905.86 | 0.95 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 1452.21 | 1520.03 | 4.67 | 1430.30 | 1485.63 | 3.87 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 8088.65 | 8443.63 | 4.39 | 455.67 | 1226.33 | 169.13 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 4011.95 | 4120.25 | 2.70 | 229.77 | 619.87 | 169.77 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 2090.57 | 2109.53 | 0.91 | 115.21 | 310.36 | 169.37 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 8166.84 | 8557.28 | 4.78 | 457.67 | 1242.86 | 171.56 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 4137.02 | 4287.95 | 3.65 | 227.26 | 624.80 | 174.93 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 2095.01 | 2102.86 | 0.37 | 114.26 | 310.83 | 172.03 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 8082.68 | 8400.56 | 3.93 | 459.59 | 1230.07 | 167.64 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 4047.67 | 4147.58 | 2.47 | 229.01 | 606.38 | 164.78 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 2086.83 | 2126.72 | 1.91 | 111.93 | 305.66 | 173.08 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 13597.19 | 13255.09 | -2.52 | 13818.39 | 13242.40 | -4.17 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 7028.26 | 6826.59 | -2.87 | 6765.15 | 6907.87 | 2.11 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 3570.40 | 3468.01 | -2.87 | 3449.66 | 3533.50 | 2.43 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 13615.99 | 13464.40 | -1.11 | 13330.02 | 13870.57 | 4.06 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 7043.31 | 6763.34 | -3.97 | 6928.88 | 7063.57 | 1.94 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 3495.12 | 3537.62 | 1.22 | 3503.41 | 3457.67 | -1.31 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 13591.66 | 13665.84 | 0.55 | 13773.27 | 13126.08 | -4.70 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 7027.08 | 7011.24 | -0.23 | 6974.98 | 6815.50 | -2.29 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 3568.28 | 3569.62 | 0.04 | 3580.67 | 3463.58 | -3.27 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 21154.03 | 21416.32 | 1.24 | 21187.01 | 21401.61 | 1.01 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 11194.24 | 10865.47 | -2.94 | 11063.19 | 10977.60 | -0.77 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 5797.80 | 5523.94 | -4.72 | 5654.63 | 5468.78 | -3.29 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 21333.89 | 21412.74 | 0.37 | 21610.94 | 20908.96 | -3.25 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 11327.07 | 11113.48 | -1.89 | 11148.25 | 10678.14 | -4.22 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 5810.69 | 5569.72 | -4.15 | 5663.26 | 5618.87 | -0.78 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 21753.20 | 21198.43 | -2.55 | 21567.90 | 21929.81 | 1.68 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 11517.08 | 11039.64 | -4.15 | 11103.08 | 10871.59 | -2.08 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 5897.16 | 5606.75 | -4.92 | 5459.87 | 5604.12 | 2.64 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 29748.53 | 28883.73 | -2.91 | 1549.02 | 3928.53 | 153.61 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 15197.09 | 15878.19 | 4.48 | 772.59 | 1924.35 | 149.08 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 8046.30 | 8081.19 | 0.43 | 388.11 | 990.28 | 155.16 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 30618.04 | 29419.19 | -3.92 | 1524.22 | 3915.97 | 156.92 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 15854.43 | 15846.37 | -0.05 | 766.09 | 1953.60 | 155.01 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 7814.77 | 7899.30 | 1.08 | 390.82 | 970.37 | 148.29 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 29596.82 | 28538.69 | -3.58 | 1530.45 | 3906.91 | 155.28 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 15662.48 | 15849.25 | 1.19 | 778.08 | 1934.31 | 148.60 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 8121.14 | 7758.59 | -4.46 | 392.78 | 959.73 | 144.34 >> RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 17465.84 | 17069.34 | -2.27 | 16849.73 | 17842.08 | 5.89 >> RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 9049.19 | 8864.15 | -2.04 | 8786.67 | 9105.34 | 3.63 >> RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 17703.38 | 17070.98 | -3.57 | 16595.85 | 17784.68 | 7.16 >> RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 9007.68 | 8817.41 | -2.11 | 8704.49 | 9185.87 | 5.53 >> RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 17531.05 | 16983.40 | -3.12 | 16947.69 | 17655.40 | 4.18 >> RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 8986.30 | 8794.15 | -2.14 | 8816.62 | 9225.95 | 4.64 >> RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 21293.95 | 21506.74 | 1.00 | 21163.29 | 21854.03 | 3.26 >> RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 11258.47 | 11072.92 | -1.65 | 11118.12 | 11338.96 | 1.99 >> RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 21253.36 | 21292.37 | 0.18 | 21224.39 | 21763.88 | 2.54 >> RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 11064.80 | 11198.35 | 1.21 | 10960.98 | 11294.14 | 3.04 >> RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 21358.14 | 21346.21 | -0.06 | 21487.25 | 21854.42 | 1.71 >> RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 11045.61 | 11208.26 | 1.47 | 10907.03 | 11415.18 | 4.66 >> RotateBenchmark.testRotateRightB | 512.00 | 7.00 | 17898.61 | 18307.54 | 2.28 | 1214.65 | 2546.64 | 109.66 >> RotateBenchmark.testRotateRightB | 512.00 | 15.00 | 17909.25 | 18242.51 | 1.86 | 1215.05 | 2563.98 | 111.02 >> RotateBenchmark.testRotateRightB | 512.00 | 31.00 | 17883.35 | 17928.44 | 0.25 | 1220.77 | 2543.30 | 108.34 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 7139.97 | 7626.72 | 6.82 | 6994.86 | 7075.65 | 1.15 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 3657.37 | 3898.34 | 6.59 | 3617.06 | 3576.12 | -1.13 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 1804.26 | 1969.19 | 9.14 | 1796.62 | 1858.84 | 3.46 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 7404.31 | 7760.09 | 4.80 | 7036.77 | 7401.52 | 5.18 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 3600.52 | 3956.35 | 9.88 | 3595.28 | 3560.36 | -0.97 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 1813.32 | 1966.41 | 8.44 | 1839.95 | 1852.53 | 0.68 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 7118.48 | 7724.81 | 8.52 | 7151.56 | 7021.09 | -1.82 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 3529.70 | 3881.63 | 9.97 | 3623.08 | 3601.01 | -0.61 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 1823.61 | 1961.34 | 7.55 | 1786.86 | 1748.85 | -2.13 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 11697.98 | 11835.25 | 1.17 | 11513.16 | 11184.87 | -2.85 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 5890.11 | 6102.57 | 3.61 | 5658.79 | 5696.08 | 0.66 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 2964.94 | 3070.26 | 3.55 | 2945.00 | 2962.08 | 0.58 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 11562.51 | 12151.29 | 5.09 | 11404.17 | 11120.28 | -2.49 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 5702.93 | 6130.57 | 7.50 | 5799.54 | 5779.08 | -0.35 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 2861.96 | 3051.44 | 6.62 | 2943.99 | 2860.65 | -2.83 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 11203.13 | 11710.59 | 4.53 | 11363.18 | 11112.16 | -2.21 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 5893.97 | 6070.71 | 3.00 | 5776.67 | 5648.84 | -2.21 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 2971.83 | 3046.76 | 2.52 | 2903.35 | 2833.88 | -2.39 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 16064.71 | 15851.35 | -1.33 | 861.93 | 2256.88 | 161.84 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 7916.80 | 8462.65 | 6.89 | 430.23 | 1147.30 | 166.67 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 4104.64 | 4068.28 | -0.89 | 216.30 | 572.86 | 164.84 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 16133.09 | 16281.59 | 0.92 | 856.36 | 2229.58 | 160.35 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 8127.26 | 8117.59 | -0.12 | 419.16 | 1176.42 | 180.66 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 4080.11 | 4063.26 | -0.41 | 218.32 | 571.93 | 161.97 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 15834.26 | 16314.64 | 3.03 | 865.96 | 2297.74 | 165.34 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 7965.62 | 8270.48 | 3.83 | 428.55 | 1148.87 | 168.08 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 4161.69 | 4034.76 | -3.05 | 215.63 | 570.19 | 164.43 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 3556.70 | 3877.08 | 9.01 | 3596.46 | 3558.32 | -1.06 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 1772.93 | 1993.86 | 12.46 | 1856.79 | 1783.22 | -3.96 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 908.66 | 1000.37 | 10.09 | 944.79 | 922.91 | -2.32 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 3742.44 | 3748.41 | 0.16 | 3788.07 | 3570.67 | -5.74 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 1817.53 | 1985.69 | 9.25 | 1892.38 | 1833.16 | -3.13 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 941.03 | 952.68 | 1.24 | 915.79 | 910.21 | -0.61 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 3649.48 | 3896.56 | 6.77 | 3637.59 | 3557.53 | -2.20 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 1840.12 | 1997.19 | 8.54 | 1821.47 | 1799.82 | -1.19 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 901.33 | 995.67 | 10.47 | 909.20 | 902.73 | -0.71 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 5789.93 | 5960.54 | 2.95 | 5758.14 | 5736.30 | -0.38 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 2963.20 | 3063.30 | 3.38 | 2943.48 | 2833.84 | -3.72 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 1501.81 | 1510.23 | 0.56 | 1463.85 | 1462.26 | -0.11 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 5870.05 | 5951.43 | 1.39 | 5794.74 | 5604.58 | -3.28 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 2971.36 | 3047.00 | 2.55 | 2931.19 | 2907.30 | -0.82 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 1473.97 | 1530.54 | 3.84 | 1473.45 | 1442.40 | -2.11 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 5858.08 | 6080.49 | 3.80 | 5863.69 | 5549.85 | -5.35 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 2916.24 | 3045.77 | 4.44 | 2981.59 | 2815.07 | -5.58 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 1441.20 | 1531.56 | 6.27 | 1492.47 | 1473.25 | -1.29 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 8147.24 | 8310.05 | 2.00 | 469.45 | 1235.21 | 163.12 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 4142.95 | 4258.86 | 2.80 | 234.14 | 615.52 | 162.88 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 2095.48 | 2087.20 | -0.40 | 113.55 | 311.19 | 174.05 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 8222.94 | 8246.58 | 0.29 | 458.91 | 1244.32 | 171.15 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 4160.04 | 4226.46 | 1.60 | 227.78 | 625.38 | 174.56 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 2064.63 | 2162.44 | 4.74 | 113.27 | 314.15 | 177.36 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 8157.94 | 8466.90 | 3.79 | 450.26 | 1221.90 | 171.37 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 4039.74 | 4283.33 | 6.03 | 224.82 | 612.68 | 172.53 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 2066.88 | 2147.51 | 3.90 | 110.97 | 303.43 | 173.42 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 13548.39 | 13245.87 | -2.23 | 13490.93 | 13084.76 | -3.01 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 7020.16 | 6768.85 | -3.58 | 6991.39 | 7044.32 | 0.76 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 3550.50 | 3505.19 | -1.28 | 3507.12 | 3612.86 | 3.01 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 13743.43 | 13325.44 | -3.04 | 13696.15 | 13255.80 | -3.22 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 6856.02 | 6969.18 | 1.65 | 6886.29 | 6834.12 | -0.76 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 3569.53 | 3492.76 | -2.15 | 3539.02 | 3470.02 | -1.95 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 13704.18 | 13495.07 | -1.53 | 13649.14 | 13583.87 | -0.48 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 7011.77 | 6953.93 | -0.82 | 6978.28 | 6740.30 | -3.41 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 3591.62 | 3620.12 | 0.79 | 3502.04 | 3510.05 | 0.23 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 21950.71 | 22113.60 | 0.74 | 21484.27 | 21596.64 | 0.52 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 11616.88 | 11099.73 | -4.45 | 11188.29 | 10737.68 | -4.03 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 5872.72 | 5579.12 | -5.00 | 5784.05 | 5454.57 | -5.70 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 22017.83 | 20817.97 | -5.45 | 21934.65 | 21356.90 | -2.63 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 11414.27 | 11044.86 | -3.24 | 11454.35 | 11140.34 | -2.74 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 5786.64 | 5634.05 | -2.64 | 5724.93 | 5639.99 | -1.48 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 21754.77 | 21466.01 | -1.33 | 21140.67 | 21970.03 | 3.92 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | -2.72 | 11204.90 | 11213.48 | 0.08 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 | 5594.33 | 5544.25 | -0.90 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 30247.03 | 30179.41 | -0.22 | 1538.75 | 3975.82 | 158.38 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 15988.73 | 15621.42 | -2.30 | 776.04 | 1910.91 | 146.24 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 8115.84 | 8025.28 | -1.12 | 389.12 | 984.46 | 152.99 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 30110.91 | 30200.69 | 0.30 | 1532.49 | 3983.77 | 159.95 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 15957.90 | 15690.73 | -1.67 | 774.90 | 1931.00 | 149.19 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 8113.26 | 8037.93 | -0.93 | 391.90 | 965.53 | 146.37 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 29816.97 | 29891.54 | 0.25 | 1538.12 | 3881.93 | 152.38 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 15405.95 | 15619.17 | 1.38 | 762.49 | 1871.00 | 145.38 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 7919.80 | 7957.35 | 0.47 | 393.63 | 972.49 | 147.06 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266054: Removing redundant teat templates. Looks good. Someone from the HotSpot side needs to review related changes. The way i read the perf numbers is that on non AVX512 systems the numbers are in the noise (no worse, no better), with significant improvement on AVX512. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3720 From psandoz at openjdk.java.net Sat May 8 15:56:05 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Sat, 8 May 2021 15:56:05 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v6] In-Reply-To: <4IY0_Zr94l_aZTe-fYIva28aZw8uYJ5k6d48uByI70E=.19f2b9e5-4958-4bb3-b016-d9f809fe3347@github.com> References: <9Z_DkUjmqefCjf9mvecHUtoLHhw1qGNWJPxufuwvXI0=.36498a86-d09f-4eea-ab89-74844dd862cf@github.com> <4IY0_Zr94l_aZTe-fYIva28aZw8uYJ5k6d48uByI70E=.19f2b9e5-4958-4bb3-b016-d9f809fe3347@github.com> Message-ID: On Sat, 8 May 2021 05:32:00 GMT, Yi Yang wrote: > It seems that Object.checkIndex can not meet our needs because it implicitly passes null to Preconditions.checkIndex, but we want to customize exception messages, so we might add extra APIs in Objects while doing the replacement. > It might be possible to directly use `Preconditions.checkIndex` for such purposes, its package is non-exported but public for reuse within `java.base`. It's used in `VarHandle` `byte` array access code, see [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/invoke/X-VarHandleByteArrayView.java.template#L117). I would caution against adding a public API to support this. The work in `Preconditions` was the basis for a public API, but it proved complicated. I am glad we did not expose that, it would of made exposing the long accepting `Objects.checkIndex` methods more difficult. Paul. ------------- PR: https://git.openjdk.java.net/jdk/pull/3615 From github.com+2249648+johntortugo at openjdk.java.net Sat May 8 20:58:36 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Sat, 8 May 2021 20:58:36 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v8] In-Reply-To: References: Message-ID: <4Utpd3LgEfl5jVQ6Zx37ydToVt-Q77mo76ZRCpneIJg=.87f03eba-2c23-48d7-a8f4-3f8b72fdfb4e@github.com> On Tue, 4 May 2021 19:15:26 GMT, Vladimir Kozlov wrote: >> @vnkozlov - thank you so much for running the tests! The cause of the problems you reported were the last changes I made to the div/mod instructions. I fixed the code and ran all tests again on Linux, macOS, and Windows and they are looking good (jdk tier1, 2, 3, and hotspot_all. > > @JohnTortugo did you fix the last issue? Let me know when I should test it. Hi @vnkozlov - I just pushed a patch that will handle the assert triggering problem. Thanks again for helping! ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From github.com+2249648+johntortugo at openjdk.java.net Sat May 8 20:58:35 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Sat, 8 May 2021 20:58:35 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v11] In-Reply-To: References: Message-ID: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 > Tested on: Linux tier1, 2 and 3 > > Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. John Tortugo has updated the pull request incrementally with one additional commit since the last revision: Fixing shift immediate constants handling. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2420/files - new: https://git.openjdk.java.net/jdk/pull/2420/files/53a5e32b..7599ed70 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=09-10 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2420.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2420/head:pull/2420 PR: https://git.openjdk.java.net/jdk/pull/2420 From kvn at openjdk.java.net Sun May 9 19:47:15 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 9 May 2021 19:47:15 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v11] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 20:58:35 GMT, John Tortugo wrote: >> Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 >> Tested on: Linux tier1, 2 and 3 >> >> Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. > > John Tortugo has updated the pull request incrementally with one additional commit since the last revision: > > Fixing shift immediate constants handling. I started testing for latest changes (10). ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From kvn at openjdk.java.net Mon May 10 03:04:01 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 10 May 2021 03:04:01 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v11] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 20:58:35 GMT, John Tortugo wrote: >> Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 >> Tested on: Linux tier1, 2 and 3 >> >> Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. > > John Tortugo has updated the pull request incrementally with one additional commit since the last revision: > > Fixing shift immediate constants handling. tier1-5 passed clean ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2420 From jiefu at openjdk.java.net Mon May 10 03:47:03 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 10 May 2021 03:47:03 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v3] In-Reply-To: References: Message-ID: On Sat, 1 May 2021 09:45:11 GMT, Denghui Dong wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> use new_pointer_register > > @egahlin Hi, could you help review this patch? Hi @D-D-H , Does it make sense to intrinsify it in C1? IMO, C1 is not easy to get it handled correctly. How about just intrinsifying it in C2? Maybe it will help the review process. Thanks. Best regards, Jie ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From github.com+2249648+johntortugo at openjdk.java.net Mon May 10 04:24:03 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Mon, 10 May 2021 04:24:03 GMT Subject: RFR: 8241502: C2: Migrate x86_64.ad to MacroAssembler [v11] In-Reply-To: References: Message-ID: On Sun, 9 May 2021 19:43:04 GMT, Vladimir Kozlov wrote: >> John Tortugo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing shift immediate constants handling. > > I started testing for latest changes (10). Thank you @vnkozlov ! ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From ddong at openjdk.java.net Mon May 10 04:49:02 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 10 May 2021 04:49:02 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v4] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 06:52:13 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > swap the positions of two operands in cmp operation since the det register will be modified in 32 bit Thanks for your comment! > Does it make sense to intrinsify it in C1? > IMO, C1 is not easy to get it handled correctly. > > How about just intrinsifying it in C2? > Maybe it will help the review process. > You are right, I found that C1 cannot handle correctly if there are many branches, and that's the reason why I made a platform-dependent code stub in C1. I agree with you that only intensify this method in C2 now, and I will delete the C1-part. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Mon May 10 05:06:53 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 10 May 2021 05:06:53 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v4] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 06:52:13 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > swap the positions of two operands in cmp operation since the det register will be modified in 32 bit A reason why I think this method should be intrinsic: Some of our instrumentation agents use JFR to record some key information, for example, the actual type of parameters or result. and in actual scenarios, a specific type is often used multiple times. And there are obviously fast and slow paths in the implementation of JfrTraceId::load. Therefore, intensifying this method will decrease the overhead for this usage. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Mon May 10 05:33:35 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 10 May 2021 05:33:35 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v5] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: remove c1 part ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/73da1108..73c1cc38 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=03-04 Stats: 207 lines in 7 files changed: 0 ins; 205 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From dongbo at openjdk.java.net Mon May 10 05:55:59 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 10 May 2021 05:55:59 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: On Mon, 26 Apr 2021 11:16:00 GMT, Dong Bo wrote: >> On aarch64, current implementations of vector reduce_add2I, reduce_max2I, reduce_min2I can be optimized with NEON pairwise instructions: >> >> >> ## reduce_add2I, before >> mov w10, v19.s[0] >> mov w2, v19.s[1] >> add w10, w0, w10 >> add w10, w10, w2 >> ## reduce_add2I, optimized >> addp v23.2s, v24.2s, v24.2s >> mov w10, v23.s[0] >> add w10, w10, w2 >> >> ## reduce_max2I, before >> dup v16.2d, v23.d[0] >> sminv s16, v16.4s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> ## reduce_max2I, optimized >> sminp v16.2s, v23.2s, v23.2s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> >> >> I don't expect this to change anything of SuperWord, vectorizing reductions of two integers is disabled by [1]. >> This is useful for VectorAPI, tested benchmarks in [2], performance can improve ~51% and ~8% for `Int64Vector.ADD` and `Int64Vector.MAX` respectively. >> >> >> Benchmark (size) Mode Cnt Score Error Units >> # optimized >> Int64Vector.ADDLanes 1024 thrpt 10 2492.123 ? 23.561 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1825.882 ? 5.261 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1921.028 ? 3.253 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1588.575 ? 3.903 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1923.913 ? 2.117 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1596.875 ? 2.163 ops/ms >> # default >> Int64Vector.ADDLanes 1024 thrpt 10 1644.223 ? 1.885 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1491.502 ? 26.436 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1784.066 ? 3.816 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1494.750 ? 3.451 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1785.266 ? 8.893 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1499.233 ? 3.498 ops/ms >> >> >> Verified correctness with tests `test/jdk/jdk/incubator/vector/`. Also tested linux-aarch64-server-fastdebug tier1-3. >> >> [1] https://github.com/openjdk/jdk/blob/3bf4c904fbbd87d4db18db22c1be384616483eed/src/hotspot/share/opto/superword.cpp#L2004 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/Int64Vector.java > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add assembler tests for smaxp/sminp PING? Any comments/suggestions are appreciated. Although this has been reviewed by Ningsheng, we still need help from reviewers here. ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From jiefu at openjdk.java.net Mon May 10 06:37:12 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 10 May 2021 06:37:12 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v5] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 08:41:32 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * ~~solve the issue on `aarch64` and `x86` by adding the rule.~~ solve the issue by adding `LShiftCntVNode` without adding any rule >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > change to static final test/hotspot/jtreg/compiler/vectorapi/TestVectorShuffleIotaByte.java line 74: > 72: bv1.intoArray(ab_64, 0); > 73: } > 74: Assert.assertEquals(ab_64, expected_64); How about putting the expected value as the first parameter, which is suggested by [1] ? [1] https://github.com/openjdk/jdk/blob/master/doc/hotspot-unit-tests.md#first-parameter-is-expected-value ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From jiefu at openjdk.java.net Mon May 10 07:18:08 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 10 May 2021 07:18:08 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v5] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Sat, 8 May 2021 08:41:32 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * ~~solve the issue on `aarch64` and `x86` by adding the rule.~~ solve the issue by adding `LShiftCntVNode` without adding any rule >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > change to static final test/hotspot/jtreg/compiler/vectorapi/TestVectorShuffleIotaByte.java line 42: > 40: > 41: @Test > 42: public class TestVectorShuffleIotaByte { Can this jtreg test reproduce the bug on x86? I tested it on our x86 platforms, but failed to reproduce it. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From thartmann at openjdk.java.net Mon May 10 09:43:56 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 10 May 2021 09:43:56 GMT Subject: RFR: 8266189: Remove C1 "IfInstanceOf" instruction In-Reply-To: References: Message-ID: <4BzwF3-l-pwl_vuBg6Jh70dJoiHKYOicNYCjLCOoWZU=.ffa79cbe-521e-4af2-b250-8ff031841b10@github.com> On Sat, 8 May 2021 10:54:40 GMT, Yi Yang wrote: > Remove IfInstanceOf instruction, it has been there for a long while(13yrs) and not implemented yet. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3935 From whuang at openjdk.java.net Mon May 10 09:46:22 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 10 May 2021 09:46:22 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v5] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Mon, 10 May 2021 07:13:48 GMT, Jie Fu wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> change to static final > > test/hotspot/jtreg/compiler/vectorapi/TestVectorShuffleIotaByte.java line 42: > >> 40: >> 41: @Test >> 42: public class TestVectorShuffleIotaByte { > > Can this jtreg test reproduce the bug on x86? > > I tested it on our x86 platforms, but failed to reproduce it. > Thanks. Thank you for your wonderful review. I will change that. > test/hotspot/jtreg/compiler/vectorapi/TestVectorShuffleIotaByte.java line 74: > >> 72: bv1.intoArray(ab_64, 0); >> 73: } >> 74: Assert.assertEquals(ab_64, expected_64); > > How about putting the expected value as the first parameter, which is suggested by [1] ? > > [1] https://github.com/openjdk/jdk/blob/master/doc/hotspot-unit-tests.md#first-parameter-is-expected-value Thank you for your review. * I think this document might be the advice of GoogleTest in jdk(`test/hotspot/gtest`) instead of jtreg (`test/hotspot/jtreg`) * The common way which can pass expect result to method is using `@DataProvider`. However, in this small test it will inflate the whole test case codes. :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Mon May 10 10:15:40 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 10 May 2021 10:15:40 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v6] In-Reply-To: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: > It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : > * show the crash case `TestVectorShuffleIotaShort` > * ~~solve the issue on `aarch64` and `x86` by adding the rule.~~ solve the issue by adding `LShiftCntVNode` without adding any rule > * test after fixing on tire1~3 > > Thank you for your review. Any suggestion is welcome. > Wang Huang Wang Huang has updated the pull request incrementally with one additional commit since the last revision: fix test case problem ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3747/files - new: https://git.openjdk.java.net/jdk/pull/3747/files/bda9c29f..6dbb458d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3747&range=04-05 Stats: 33 lines in 1 file changed: 21 ins; 5 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/3747.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3747/head:pull/3747 PR: https://git.openjdk.java.net/jdk/pull/3747 From redestad at openjdk.java.net Mon May 10 13:56:18 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 10 May 2021 13:56:18 GMT Subject: RFR: 8266810: Move trivial Matcher code to cpu-specific header files Message-ID: This patch moves a number of constants and trivial methods to newly introduced matcher_.hpp files. This enables constant folding and dead code elimination on one hand, and improved code navigation in IDEs on the other. The effect of this refactoring is modest: on Linux-x64 Hotspot (libjvm.so) shrinks by ~10Kb and C2 initialization cost drops from 8.5M to 8.3M. Testing: tier1-3, GHA builds of all architectures on linux ------------- Commit messages: - Move more predicates from .ad files - Move a few more predicates to matcher_cpu files - Add the new matcher per-cpu files - Add matcher_cpu files and move const bools there - Define constants to allow Matcher::supports_scalable_vector() calls to be DCEd on non-aarch64 platforms Changes: https://git.openjdk.java.net/jdk/pull/3947/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3947&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266810 Stats: 1459 lines in 15 files changed: 775 ins; 682 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3947.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3947/head:pull/3947 PR: https://git.openjdk.java.net/jdk/pull/3947 From aph at openjdk.java.net Mon May 10 14:32:19 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 10 May 2021 14:32:19 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: On Mon, 26 Apr 2021 11:16:00 GMT, Dong Bo wrote: >> On aarch64, current implementations of vector reduce_add2I, reduce_max2I, reduce_min2I can be optimized with NEON pairwise instructions: >> >> >> ## reduce_add2I, before >> mov w10, v19.s[0] >> mov w2, v19.s[1] >> add w10, w0, w10 >> add w10, w10, w2 >> ## reduce_add2I, optimized >> addp v23.2s, v24.2s, v24.2s >> mov w10, v23.s[0] >> add w10, w10, w2 >> >> ## reduce_max2I, before >> dup v16.2d, v23.d[0] >> sminv s16, v16.4s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> ## reduce_max2I, optimized >> sminp v16.2s, v23.2s, v23.2s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> >> >> I don't expect this to change anything of SuperWord, vectorizing reductions of two integers is disabled by [1]. >> This is useful for VectorAPI, tested benchmarks in [2], performance can improve ~51% and ~8% for `Int64Vector.ADD` and `Int64Vector.MAX` respectively. >> >> >> Benchmark (size) Mode Cnt Score Error Units >> # optimized >> Int64Vector.ADDLanes 1024 thrpt 10 2492.123 ? 23.561 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1825.882 ? 5.261 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1921.028 ? 3.253 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1588.575 ? 3.903 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1923.913 ? 2.117 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1596.875 ? 2.163 ops/ms >> # default >> Int64Vector.ADDLanes 1024 thrpt 10 1644.223 ? 1.885 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1491.502 ? 26.436 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1784.066 ? 3.816 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1494.750 ? 3.451 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1785.266 ? 8.893 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1499.233 ? 3.498 ops/ms >> >> >> Verified correctness with tests `test/jdk/jdk/incubator/vector/`. Also tested linux-aarch64-server-fastdebug tier1-3. >> >> [1] https://github.com/openjdk/jdk/blob/3bf4c904fbbd87d4db18db22c1be384616483eed/src/hotspot/share/opto/superword.cpp#L2004 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/Int64Vector.java > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add assembler tests for smaxp/sminp Looking now. I can't quite understand how to run tests from panama-vector on JDK head. If the JMH test is relevant to JDK, not just panama-vector, why not add it to JDK? ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From psandoz at openjdk.java.net Mon May 10 15:58:11 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 10 May 2021 15:58:11 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: On Mon, 10 May 2021 14:29:47 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> add assembler tests for smaxp/sminp > > Looking now. I can't quite understand how to run tests from panama-vector on JDK head. If the JMH test is relevant to JDK, not just panama-vector, why not add it to JDK? @theRealAph we are still working out how best to bring the vector performance tests over to the `test/micro` area of mainline. (Some preliminary work is [here](https://github.com/openjdk/panama-vector/pull/77)). The perf tests in the `panama-vector` are under a maven project and it should be possible to build/run that project with a mainline build of the JDK (the tests should be compatible). ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From aph at redhat.com Mon May 10 16:41:17 2021 From: aph at redhat.com (Andrew Haley) Date: Mon, 10 May 2021 17:41:17 +0100 Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: <86888902-ae64-605b-6287-8b04d314797e@redhat.com> On 5/10/21 6:55 AM, Dong Bo wrote: > PING? Any comments/suggestions are appreciated. > Although this has been reviewed by Ningsheng, we still need help from reviewers here. I'm testing this now. From github.com+2249648+johntortugo at openjdk.java.net Mon May 10 17:13:04 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Mon, 10 May 2021 17:13:04 GMT Subject: Integrated: 8241502: C2: Migrate x86_64.ad to MacroAssembler In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 03:15:15 GMT, John Tortugo wrote: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 > Tested on: Linux tier1, 2 and 3 > > Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. This pull request has now been integrated. Changeset: de784312 Author: Cesar Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3 Stats: 1267 lines in 3 files changed: 622 ins; 87 del; 558 mod 8241502: C2: Migrate x86_64.ad to MacroAssembler Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From kvn at openjdk.java.net Mon May 10 17:23:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 10 May 2021 17:23:07 GMT Subject: RFR: 8266810: Move trivial Matcher code to cpu-specific header files In-Reply-To: References: Message-ID: On Mon, 10 May 2021 11:03:06 GMT, Claes Redestad wrote: > This patch moves a number of constants and trivial methods to newly introduced matcher_.hpp files. > > This enables constant folding and dead code elimination on one hand, and improved code navigation in IDEs on the other. > > The effect of this refactoring is modest: on Linux-x64 Hotspot (libjvm.so) shrinks by ~10Kb and C2 initialization cost drops from 8.5M to 8.3M. > > Testing: tier1-3, GHA builds of all architectures on linux Nice. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3947 From kvn at openjdk.java.net Mon May 10 18:28:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 10 May 2021 18:28:56 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v6] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Mon, 10 May 2021 10:15:40 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * ~~solve the issue on `aarch64` and `x86` by adding the rule.~~ solve the issue by adding `LShiftCntVNode` without adding any rule >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix test case problem The fix looks correct. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3747 From sviswanathan at openjdk.java.net Mon May 10 18:31:30 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 10 May 2021 18:31:30 GMT Subject: RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations [v4] In-Reply-To: References: Message-ID: > All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin). > > Changes include: > * Rewrite Vector API slice/unslice using already intrinsic methods > * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails > * Vector API conversion tests thresholds adjustment > > Base Performance: > Benchmark (size) Mode Cnt Score Error Units > TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ? 254.580 ops/ms > TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ? 326.770 ops/ms > TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ? 22.060 ops/ms > TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ? 220.340 ops/ms > TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ? 231.703 ops/ms > > Performance with patch: > Benchmark (size) Mode Cnt Score Error Units > TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ? 37.296 ops/ms > TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ? 115.886 ops/ms > TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ? 8.203 ops/ms > TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ? 43.637 ops/ms > TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ? 13.376 ops/ms Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: library_call.cpp changes not needed after Objects.checkIndex arguments fixed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3804/files - new: https://git.openjdk.java.net/jdk/pull/3804/files/94f184ef..14439667 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3804.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3804/head:pull/3804 PR: https://git.openjdk.java.net/jdk/pull/3804 From sviswanathan at openjdk.java.net Mon May 10 18:36:45 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 10 May 2021 18:36:45 GMT Subject: RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations [v3] In-Reply-To: References: <-VJVRq98DoOIRYPdSB8b2k1oOAI2vHeyKgo00bvGbSE=.00ef20ca-d072-4426-9e8e-c0e5433d3440@github.com> Message-ID: <5EgHtRxBzhDqAo_lhlf8yYWw5J9yPaYsHUtlf7Db1mw=.836e61e6-0caf-4ff7-ad83-cfbcfb907723@github.com> On Fri, 30 Apr 2021 23:34:15 GMT, Paul Sandoz wrote: >>> @PaulSandoz would it be possible for you to run this through your testing? >> >> Started, will report back when done. > >> > @PaulSandoz would it be possible for you to run this through your testing? >> >> Started, will report back when done. > > Tier 1 to 3 tests all pass on build profiles linux-x64 linux-aarch64 macosx-x64 windows-x64 @PaulSandoz After we fixed the Objects.checkIndex arguments per your review comment, the changes in library_call.cpp are no more needed so I have backed them out. That leaves only changes in vector api code which you have reviewed. Please let me know if it is ok to push. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804 From psandoz at openjdk.java.net Mon May 10 19:05:20 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 10 May 2021 19:05:20 GMT Subject: RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations [v4] In-Reply-To: References: Message-ID: On Mon, 10 May 2021 18:31:30 GMT, Sandhya Viswanathan wrote: >> All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin). >> >> Changes include: >> * Rewrite Vector API slice/unslice using already intrinsic methods >> * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails >> * Vector API conversion tests thresholds adjustment >> >> Base Performance: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ? 254.580 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ? 326.770 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ? 22.060 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ? 220.340 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ? 231.703 ops/ms >> >> Performance with patch: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ? 37.296 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ? 115.886 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ? 8.203 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ? 43.637 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ? 13.376 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > library_call.cpp changes not needed after Objects.checkIndex arguments fixed I don't claim to understand the HotSpot details, but Java changes still look good. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3804 From github.com+2249648+johntortugo at openjdk.java.net Mon May 10 19:43:14 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Mon, 10 May 2021 19:43:14 GMT Subject: RFR: JDK-8266601: Fix bugs in AddLNode::Ideal transformations Message-ID: Some small fixes to the IR transformations performed in AddLNode::Ideal. Tested (x86) Fastdebug build with JDK-tier1/2/3 and hotspot {tier1, compiler, serviceability, runtime, misc, tier1_compiler} running on Linux, macOS and Windows. ------------- Commit messages: - Remove new line at end of file. - Fixes on AddLNode Ideal transformations - Merge pull request #2 from openjdk/master - Merge pull request #1 from openjdk/master Changes: https://git.openjdk.java.net/jdk/pull/3955/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3955&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266601 Stats: 12 lines in 1 file changed: 0 ins; 10 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3955.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3955/head:pull/3955 PR: https://git.openjdk.java.net/jdk/pull/3955 From vlivanov at openjdk.java.net Mon May 10 21:03:56 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 10 May 2021 21:03:56 GMT Subject: RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations [v4] In-Reply-To: References: Message-ID: On Mon, 10 May 2021 18:31:30 GMT, Sandhya Viswanathan wrote: >> All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin). >> >> Changes include: >> * Rewrite Vector API slice/unslice using already intrinsic methods >> * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails >> * Vector API conversion tests thresholds adjustment >> >> Base Performance: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ? 254.580 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ? 326.770 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ? 22.060 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ? 220.340 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ? 231.703 ops/ms >> >> Performance with patch: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ? 37.296 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ? 115.886 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ? 8.203 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ? 43.637 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ? 13.376 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > library_call.cpp changes not needed after Objects.checkIndex arguments fixed Looks good. PS: I still think there's a problem with `LibraryCallKit::inline_preconditions_checkIndex`: it shouldn't bail out intrinsification in the middle of the process leaving a partially constructed graph behind. I don't see why short-circuiting the logic once the path is dead (`if (stopped()) return true;`) won't work. But that's a topic for another fix. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3804 From kvn at openjdk.java.net Mon May 10 21:29:55 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 10 May 2021 21:29:55 GMT Subject: RFR: JDK-8266601: Fix bugs in AddLNode::Ideal transformations In-Reply-To: References: Message-ID: On Mon, 10 May 2021 19:33:17 GMT, John Tortugo wrote: > Some small fixes to the IR transformations performed in AddLNode::Ideal. > Tested (x86) Fastdebug build with JDK-tier1/2/3 and hotspot {tier1, compiler, serviceability, runtime, misc, tier1_compiler} running on Linux, macOS and Windows. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3955 From sviswanathan at openjdk.java.net Mon May 10 21:49:14 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 10 May 2021 21:49:14 GMT Subject: RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations [v4] In-Reply-To: References: Message-ID: On Mon, 10 May 2021 18:31:30 GMT, Sandhya Viswanathan wrote: >> All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin). >> >> Changes include: >> * Rewrite Vector API slice/unslice using already intrinsic methods >> * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails >> * Vector API conversion tests thresholds adjustment >> >> Base Performance: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ? 254.580 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ? 326.770 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ? 22.060 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ? 220.340 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ? 231.703 ops/ms >> >> Performance with patch: >> Benchmark (size) Mode Cnt Score Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ? 37.296 ops/ms >> TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ? 115.886 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ? 8.203 ops/ms >> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ? 43.637 ops/ms >> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ? 13.376 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > library_call.cpp changes not needed after Objects.checkIndex arguments fixed Thanks a lot for the review Paul and Vladimir. Vladimir, I will submit a separate patch for LibraryCallKit::inline_preconditions_checkIndex fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804 From vlivanov at openjdk.java.net Mon May 10 21:50:55 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 10 May 2021 21:50:55 GMT Subject: RFR: JDK-8266601: Fix bugs in AddLNode::Ideal transformations In-Reply-To: References: Message-ID: On Mon, 10 May 2021 19:33:17 GMT, John Tortugo wrote: > Some small fixes to the IR transformations performed in AddLNode::Ideal. > Tested (x86) Fastdebug build with JDK-tier1/2/3 and hotspot {tier1, compiler, serviceability, runtime, misc, tier1_compiler} running on Linux, macOS and Windows. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3955 From sviswanathan at openjdk.java.net Mon May 10 21:52:56 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 10 May 2021 21:52:56 GMT Subject: Integrated: 8265128: [REDO] Optimize Vector API slice and unslice operations In-Reply-To: References: Message-ID: On Thu, 29 Apr 2021 21:29:03 GMT, Sandhya Viswanathan wrote: > All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin). > > Changes include: > * Rewrite Vector API slice/unslice using already intrinsic methods > * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails > * Vector API conversion tests thresholds adjustment > > Base Performance: > Benchmark (size) Mode Cnt Score Error Units > TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ? 254.580 ops/ms > TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ? 326.770 ops/ms > TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ? 22.060 ops/ms > TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ? 220.340 ops/ms > TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ? 231.703 ops/ms > > Performance with patch: > Benchmark (size) Mode Cnt Score Error Units > TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ? 37.296 ops/ms > TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ? 115.886 ops/ms > TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ? 8.203 ops/ms > TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ? 43.637 ops/ms > TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ? 13.376 ops/ms This pull request has now been integrated. Changeset: 23446f1f Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/23446f1f5ee087376bc1ab89413a011fc52bde1f Stats: 881 lines in 43 files changed: 172 ins; 518 del; 191 mod 8265128: [REDO] Optimize Vector API slice and unslice operations Reviewed-by: psandoz, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/3804 From sviswanathan at openjdk.java.net Mon May 10 23:41:16 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 10 May 2021 23:41:16 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out Message-ID: LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" Consider the following code snippet: ... set_control(_gvn.transform(new IfTrueNode(rc))); { PreserveJVMState pjvms(this); set_control(_gvn.transform(new IfFalseNode(rc))); uncommon_trap(Deoptimization::Reason_range_check, Deoptimization::Action_make_not_entrant); } .. Here the control is being modified by set_control even though a bailout is possible afterwards. Moving the set_control later in the intrinsic fixes this. This is a small fix. Please review. Best Regards, Sandhya ------------- Commit messages: - 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out Changes: https://git.openjdk.java.net/jdk/pull/3958/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3958&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266854 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3958.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3958/head:pull/3958 PR: https://git.openjdk.java.net/jdk/pull/3958 From kvn at openjdk.java.net Mon May 10 23:59:25 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 10 May 2021 23:59:25 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out In-Reply-To: References: Message-ID: On Mon, 10 May 2021 23:09:33 GMT, Sandhya Viswanathan wrote: > LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: > "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" > > Consider the following code snippet: > ... > set_control(_gvn.transform(new IfTrueNode(rc))); > { > PreserveJVMState pjvms(this); > set_control(_gvn.transform(new IfFalseNode(rc))); > uncommon_trap(Deoptimization::Reason_range_check, > Deoptimization::Action_make_not_entrant); > } > .. > Here the control is being modified by set_control even though a bailout is possible afterwards. > Moving the set_control later in the intrinsic fixes this. > > This is a small fix. Please review. > > Best Regards, > Sandhya src/hotspot/share/opto/library_call.cpp line 1049: > 1047: record_for_igvn(rc); > 1048: } > 1049: set_control(_gvn.transform(new IfTrueNode(rc))); Or you can move `stopped()` check here (after `set_control()`) and avoid generation of useless `false` path. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From sviswanathan at openjdk.java.net Tue May 11 01:00:59 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 11 May 2021 01:00:59 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out In-Reply-To: References: Message-ID: On Mon, 10 May 2021 23:56:14 GMT, Vladimir Kozlov wrote: >> LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: >> "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" >> >> Consider the following code snippet: >> ... >> set_control(_gvn.transform(new IfTrueNode(rc))); >> { >> PreserveJVMState pjvms(this); >> set_control(_gvn.transform(new IfFalseNode(rc))); >> uncommon_trap(Deoptimization::Reason_range_check, >> Deoptimization::Action_make_not_entrant); >> } >> .. >> Here the control is being modified by set_control even though a bailout is possible afterwards. >> Moving the set_control later in the intrinsic fixes this. >> >> This is a small fix. Please review. >> >> Best Regards, >> Sandhya > > src/hotspot/share/opto/library_call.cpp line 1049: > >> 1047: record_for_igvn(rc); >> 1048: } >> 1049: set_control(_gvn.transform(new IfTrueNode(rc))); > > Or you can move `stopped()` check here (after `set_control()`) and avoid generation of useless `false` path. Thanks @vnkozlov for the review. It is this set_control() that is resulting in an assert that control is changed but the intrinsic bailed out. Moving the stopped check right before set_control() would fix the issue. I will update the patch accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From sviswanathan at openjdk.java.net Tue May 11 01:06:25 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 11 May 2021 01:06:25 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v2] In-Reply-To: References: Message-ID: <7mjYr1XHcHP2aQq3I9rvx8EorlaiISkO2tf3ePZ9ACQ=.a41f5461-a6cd-49eb-b128-ad232c15da0a@github.com> > LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: > "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" > > Consider the following code snippet: > ... > set_control(_gvn.transform(new IfTrueNode(rc))); > { > PreserveJVMState pjvms(this); > set_control(_gvn.transform(new IfFalseNode(rc))); > uncommon_trap(Deoptimization::Reason_range_check, > Deoptimization::Action_make_not_entrant); > } > .. > Here the control is being modified by set_control even though a bailout is possible afterwards. > Moving the set_control later in the intrinsic fixes this. > > This is a small fix. Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments from Vladimir Kozlov ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3958/files - new: https://git.openjdk.java.net/jdk/pull/3958/files/40892c40..1283b4a6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3958&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3958&range=00-01 Stats: 16 lines in 1 file changed: 7 ins; 8 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3958.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3958/head:pull/3958 PR: https://git.openjdk.java.net/jdk/pull/3958 From yyang at openjdk.java.net Tue May 11 02:13:14 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 11 May 2021 02:13:14 GMT Subject: RFR: 8266189: Remove C1 "IfInstanceOf" instruction In-Reply-To: References: Message-ID: On Sat, 8 May 2021 10:54:40 GMT, Yi Yang wrote: > Remove IfInstanceOf instruction, it has been there for a long while(13yrs) and not implemented yet. Thank you Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/3935 From yyang at openjdk.java.net Tue May 11 02:33:37 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 11 May 2021 02:33:37 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion Message-ID: C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. Candidates: NegateOp,Convert. Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: // run with -XX:+PrintValueNumbering static int foo10(int t){ int sum=12; for(int i=0;i<100;i++){ sum += 12; sum += -t; sum += (long)t; } return sum; } Before: [...] * loop invariant code motion for short loop B1 processing block B1 Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) Instruction i10 is loop invariant processing block B2 Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) substitution for 12 set to 5 Instruction i12 is loop invariant // only 12 is recongized Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) Instruction i20 is loop invariant ** loop successfully optimized [...] After: [...] ** loop invariant code motion for short loop B1 processing block B1 Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) Instruction i10 is loop invariant 6 0 i10 100 processing block B2 Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) substitution for i12 set to 105 Instruction i12 is loop invariant 11 0 i12 12 Instruction i14 is loop invariant . 16 0 i14 -i4 Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) Instruction l17 is loop invariant . 22 0 l17 i2l(i4) Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) Instruction i20 is loop invariant 26 0 i20 1 [...] ------------- Commit messages: - more instructions Changes: https://git.openjdk.java.net/jdk/pull/3965/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3965&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266798 Stats: 11 lines in 1 file changed: 7 ins; 3 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3965.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3965/head:pull/3965 PR: https://git.openjdk.java.net/jdk/pull/3965 From yyang at openjdk.java.net Tue May 11 02:53:10 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 11 May 2021 02:53:10 GMT Subject: RFR: 8266874: Clean up C1 canonicalizer for TableSwitch/LookupSwitch Message-ID: Clean up C1 canonicalizer for TableSwitch/LookupSwitch ------------- Commit messages: - canonicalizer cleanup Changes: https://git.openjdk.java.net/jdk/pull/3966/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3966&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266874 Stats: 34 lines in 1 file changed: 0 ins; 34 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3966.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3966/head:pull/3966 PR: https://git.openjdk.java.net/jdk/pull/3966 From ddong at openjdk.java.net Tue May 11 02:55:59 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 11 May 2021 02:55:59 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v5] In-Reply-To: References: Message-ID: On Mon, 10 May 2021 05:33:35 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > remove c1 part Gentle ping, could I have a review from the compiler team? ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From jiefu at openjdk.java.net Tue May 11 03:14:57 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 11 May 2021 03:14:57 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v6] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Mon, 10 May 2021 10:15:40 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * ~~solve the issue on `aarch64` and `x86` by adding the rule.~~ solve the issue by adding `LShiftCntVNode` without adding any rule >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix test case problem Looks good to me. Thanks for your update. ------------- Marked as reviewed by jiefu (Committer). PR: https://git.openjdk.java.net/jdk/pull/3747 From jiefu at openjdk.java.net Tue May 11 03:20:58 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 11 May 2021 03:20:58 GMT Subject: RFR: 8265956: JVM crashes when matching LShiftVB Node [v6] In-Reply-To: References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Mon, 10 May 2021 10:15:40 GMT, Wang Huang wrote: >> It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : >> * show the crash case `TestVectorShuffleIotaShort` >> * ~~solve the issue on `aarch64` and `x86` by adding the rule.~~ solve the issue by adding `LShiftCntVNode` without adding any rule >> * test after fixing on tire1~3 >> >> Thank you for your review. Any suggestion is welcome. >> Wang Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix test case problem All the vectorapi tests passed in our x86 platforms. So sponsor it. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From whuang at openjdk.java.net Tue May 11 03:22:58 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 11 May 2021 03:22:58 GMT Subject: Integrated: 8265956: JVM crashes when matching LShiftVB Node In-Reply-To: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> References: <0lX7U-sIV4c6Wi1Cyw3caRD6ZTVCIYzbiiji3uPXf4M=.c0494113-32c5-40c4-bdd7-93b226ffe236@github.com> Message-ID: On Wed, 28 Apr 2021 02:51:06 GMT, Wang Huang wrote: > It is fount that the rule `match(Set dst (LShiftVB src shift))` is missing on many cpus, such like `aarch64` and `x86`. It is this reason that JVM will crash under `JDK-8265956`'s test case. In this commit, I : > * show the crash case `TestVectorShuffleIotaShort` > * ~~solve the issue on `aarch64` and `x86` by adding the rule.~~ solve the issue by adding `LShiftCntVNode` without adding any rule > * test after fixing on tire1~3 > > Thank you for your review. Any suggestion is welcome. > Wang Huang This pull request has now been integrated. Changeset: 10a049e1 Author: Wang Huang Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/10a049e1714bfe64f895177f4de7a31ad65f407a Stats: 122 lines in 2 files changed: 120 ins; 0 del; 2 mod 8265956: JVM crashes when matching LShiftVB Node Co-authored-by: Wang Huang Co-authored-by: Ai Jiaming Reviewed-by: kvn, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/3747 From kvn at openjdk.java.net Tue May 11 04:33:55 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 11 May 2021 04:33:55 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v2] In-Reply-To: References: Message-ID: On Tue, 11 May 2021 00:57:55 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/library_call.cpp line 1049: >> >>> 1047: record_for_igvn(rc); >>> 1048: } >>> 1049: set_control(_gvn.transform(new IfTrueNode(rc))); >> >> Or you can move `stopped()` check here (after `set_control()`) and avoid generation of useless `false` path. > > Thanks @vnkozlov for the review. It is this set_control() that is resulting in an assert that control is changed but the intrinsic bailed out. Moving the stopped check right before set_control() would fix the issue. I will update the patch accordingly. Strange. Actually `BuildCutout` also change control and we have `stopped()` check after it at line #1036. After that check there is no control change so checking `stopped()` again here looks strange if it helps. May be this code missing some compile time checks for constants (like TOP) which collapse this graph so we can bailout it without generating code. Note, the assert is correct - we should not change graph too mach. It should be dead if returns `false` and it will do normal compilation of the bytecode. Need more information about this case. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From jbhateja at openjdk.java.net Tue May 11 04:49:56 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 11 May 2021 04:49:56 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v5] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 15:40:53 GMT, Paul Sandoz wrote: > Looks good. Someone from the HotSpot side needs to review related changes. > > The way i read the perf numbers is that on non AVX512 systems the numbers are in the noise (no worse, no better), with significant improvement on AVX512. Hi @PaulSandoz, thanks for your suggestions and review, yes AVX2 speedup is an artifact of the code re-organization, since C2 now directly dismantles the rotates thus this offer better in-lining behavior compared to existing implementation and this is what show up in the performance data. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720 From dongbo at openjdk.java.net Tue May 11 06:05:10 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 11 May 2021 06:05:10 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: <2KqJJ5tCZUT3xNU7YmS8PohUBhSEIP7nmlOKYnSIeAg=.8e58f09b-d6e9-4c57-810e-081d735c46dc@github.com> On Mon, 10 May 2021 14:29:47 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> add assembler tests for smaxp/sminp > > Looking now. I can't quite understand how to run tests from panama-vector on JDK head. If the JMH test is relevant to JDK, not just panama-vector, why not add it to JDK? > @theRealAph we are still working out how best to bring the vector performance tests over to the `test/micro` area of mainline. (Some preliminary work is [here](https://github.com/openjdk/panama-vector/pull/77)). The perf tests in the `panama-vector` are under a maven project and it should be possible to build/run that project with a mainline build of the JDK (the tests should be compatible). Hi, @theRealAph @PaulSandoz. Thanks for the comments. Because [1] has not been merged into mainline, we cannot build the tests directly with mainline JDK: [ERROR] panama-vector/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/utf8/DecodeBench.java:[354,31] cannot find symbol symbol: method intoCharArray(char[],int) location: class jdk.incubator.vector.ShortVector When I tested this, the incompatible `DecodeBench.java` was deleted first since it is all about `ShortVector` and `ByteVector` rather than `Int64Vector`. Then the perf tests were built/run with mainline builds of JDK: mvn install java -jar target/vector-benchmarks.jar benchmark.jdk.incubator.vector.Int64Vector.["M"|"ADD"]+[AXIN]*["Masked"]*Lanes -wi 10 -w 1000ms -f 1 -i 10 -r 1000ms [1] https://github.com/openjdk/panama-vector/pull/22 ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From jbhateja at openjdk.java.net Tue May 11 06:16:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 11 May 2021 06:16:53 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 7 May 2021 19:04:01 GMT, Paul Sandoz wrote: > > Hi @PaulSandoz , that's a nice suggestion, I think instead of reduction which may emit bulky sequence, VectorMask.toLong() + Long.bitCount() could have been used for trueCount. But since toLong may not work for ARM SVE, so in the mean time intrinsifying at the level of API looked reasonable. > > Do you mean that reusing `VectorSupport.reductionCoerced` as the intrinsic entry point may emit bulky sequence? Hi @PaulSandoz , semantically reductionCoerced could be used as an entry point for trueCount (VECTOR_OP_BITCOUNT) since we are iterating over each lane element (boolean type in this case) and returning the final set bits count, but for lastTrue and firstTrue operation are more like iterative operation on the lines of Vector.lane and Vector.withLane for which we have explicit entry points. Also VectorSupport.reductionCoerced adds a constraint on the type parameter V to have lower bound as Vector, VectorMask is not in the hierarchy of Vector class. We can relax that constraint though. In addition we may need bypass some portions in LibraryCallKit::inline_vector_reduction for mask query APIs, given all this does it sound reasonable to add a one different entry point (maskOp) for all the mask query APIs. Looking for your feedback. > > Note that i was not suggesting to reuse `Long.bitCount()` etc. i was just using that as a example that the bit-wise reduction operations on masks can also apply to integral vectors, suggesting there might be some sharing in C2 just like is done for binary-wise operations, such as logical AND. > > For example: > > ``` > @Override > @ForceInline > public Int256Mask and(VectorMask mask) { > Objects.requireNonNull(mask); > Int256Mask m = (Int256Mask)mask; > return VectorSupport.binaryOp(VECTOR_OP_AND, Int256Mask.class, int.class, VLENGTH, > this, m, > (m1, m2) -> m1.bOp(m2, (i, a, b) -> a & b)); > } > ``` > > And notice that `VECTOR_OP_AND` is reused for vector lane-wise binary and reduction operations on `IntVector` etc. Can we do the same for other bitwise reduction-like operations, first implementing optimal support for masks, then later expanding for integral vectors? > > So rather than introducing specific constants, such as `VECTOR_OP_MASK_TRUECOUNT` etc, we can generalize to `VECTOR_OP_BITCOUNT` etc that can apply to both masks and integral vectors, where for masks we interpret `BIT` appropriately to mean `boolean` true value. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From whuang at openjdk.java.net Tue May 11 06:26:34 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 11 May 2021 06:26:34 GMT Subject: RFR: 8263006: Add optimization for Max(*)Node and Min(*)Node [v2] In-Reply-To: References: Message-ID: On Mon, 19 Apr 2021 19:55:08 GMT, Vladimir Kozlov wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> adjust code style > > Thank you for answers. > The test should be part of changes. Could you do me a favor to review the patch? @vnkozlov @mgkwill Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/3513 From thartmann at openjdk.java.net Tue May 11 07:33:33 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 07:33:33 GMT Subject: RFR: 8261158: JVMState should not be shared between SafePointNodes Message-ID: We often, for example with loop strip mining, clone `SafePointNodes` without cloning their `JVMState`, leading to the same state being shared by different nodes. With Valhalla, we then hit asserts when aggressively scalarizing inline types in safepoints during IGVN because `debug_end()` (`_endoff`) of the attached `JVMState` is larger than `SafePointNode::_max`. That happens because the same `JVMState` has been modified during scalarizing in another `SafePointNode`, the details are described in https://github.com/openjdk/valhalla/pull/322. I don't think `JVMStates` should be shared between safepoint nodes and added an assert to catch this. The fix is to always shallow clone the `JVMState` when cloning a `SafepointNode`. Sometimes, a deep clone is required already by current code for `CallNodes` (see `CallNode::needs_clone_jvms`), I left that code as is. Thanks, Tobias ------------- Commit messages: - 8261158: JVMState should not be shared between SafePointNodes Changes: https://git.openjdk.java.net/jdk/pull/3951/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3951&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261158 Stats: 24 lines in 3 files changed: 16 ins; 8 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3951.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3951/head:pull/3951 PR: https://git.openjdk.java.net/jdk/pull/3951 From mbaesken at openjdk.java.net Tue May 11 09:03:13 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Tue, 11 May 2021 09:03:13 GMT Subject: RFR: JDK-8266892: avoid maybe-uninitialized gcc warnings on linux s390x Message-ID: In the linux s390x hs code there are a few "maybe-uninitialized" gcc warnings with gcc 8 (those warning class is disabled currently in jdk17 but enabled e.g. in jdk11). It would be good to fix them anyway which is done by this small change. ------------- Commit messages: - JDK-8266892 Changes: https://git.openjdk.java.net/jdk/pull/3970/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3970&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266892 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3970.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3970/head:pull/3970 PR: https://git.openjdk.java.net/jdk/pull/3970 From thartmann at openjdk.java.net Tue May 11 09:10:57 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 09:10:57 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v2] In-Reply-To: References: Message-ID: On Tue, 27 Apr 2021 15:57:07 GMT, Roland Westrelin wrote: >> Sinking data nodes out of a loop when all uses are out of a loop has >> several issues that this attempts to fix. >> >> 1- Only non control uses are considered which makes little sense (why >> not sink if the data node is an argument to a call or a returned >> value?) >> >> 2- Sinking of Loads is broken because of the handling of >> anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control >> in the loop because it takes all uses into account. >> >> 3- For data nodes for which a control edge can't be set, commoning of >> clones back in the loop is prevented with: >> _igvn._worklist.yank(x); >> which gives no guarantee >> >> This patch tries to address all issues: >> >> 1- it looks at all uses, not only non control uses >> >> 2- anti-dependences are computed for each use independently >> >> 3- Cast nodes are used to pin clones out of loop >> >> >> 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl() >> logic. While working on this, I noticed a bug in anti-dependence >> analysis: when the use is a cfg node, the code sometimes looks at uses >> of the memory state of the cfg. The logic uses the use of the cfg >> which is a projection of adr_type identical to the cfg. It should >> instead look at the use of the memory projection. >> >> The existing logic for sinking loads calls clear_dom_lca_tags() for >> every load which seems like quite a waste. I added a >> _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By >> incrementing _dom_lca_tags_round, new tags that don't conflict with >> existing ones are produced and there's no need for >> clear_dom_lca_tags(). >> >> For anti-dependence analysis to return a correct result, early control >> of the load is needed. The only way to get it at this stage, AFAICT, >> is to compute it by following the load's input until a pinned node is >> reached. >> >> The existing logic pins cloned nodes next to their use. The logic I >> propose pins them right out of the loop. This could possibly avoid >> some redundant clones. It also makes some special handling for corner >> cases with loop strip mining useless. >> >> For 3-, I added extra Cast nodes for float types. If a chain of data >> nodes are sunk, the new logic tries to keep a single Cast for the >> entire chain rather than one Cast per node. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - CastVV > - Merge branch 'master' into JDK-8252372 > - extra comments > - fix This is hard to review but looks reasonable to me. Performance and correctness testing also looks good. src/hotspot/share/opto/loopopts.cpp line 1137: > 1135: //------------------------------place_near_use--------------------------------- > 1136: // Place some computation next to use but not inside inner loops. > 1137: Node* PhaseIdealLoop::place_near_use(Node* useblock, IdealLoopTree* loop) const { Maybe the name and comment should be adjusted since we no longer place it next to the use but right outside of the loop. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3689 From shade at openjdk.java.net Tue May 11 09:14:55 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 11 May 2021 09:14:55 GMT Subject: RFR: JDK-8266892: avoid maybe-uninitialized gcc warnings on linux s390x In-Reply-To: References: Message-ID: On Tue, 11 May 2021 08:56:05 GMT, Matthias Baesken wrote: > In the linux s390x hs code there are a few "maybe-uninitialized" gcc warnings with gcc 8 (those warning class is disabled currently in jdk17 but enabled e.g. in jdk11). It would be good to fix them anyway which is done by this small change. Looks fine and trivial. Consider simplifying the comment. src/hotspot/cpu/s390/assembler_s390.inline.hpp line 1390: > 1388: // Having a default clause makes the compiler happy. > 1389: ShouldNotReachHere(); > 1390: *instr = 0L; // This assignment is there to make gcc8 happy. Maybe just move the comment to the comment block above: // Having a default clause and definite assignment makes the compiler happy. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3970 From lucy at openjdk.java.net Tue May 11 09:14:55 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 11 May 2021 09:14:55 GMT Subject: RFR: JDK-8266892: avoid maybe-uninitialized gcc warnings on linux s390x In-Reply-To: References: Message-ID: <5D1fSVv3NsQzXtEFQo6-I4xpgKn0U-UhjbUT2f6mgog=.615d98ce-122a-48f4-90f1-f7a1fe3f8107@github.com> On Tue, 11 May 2021 08:56:05 GMT, Matthias Baesken wrote: > In the linux s390x hs code there are a few "maybe-uninitialized" gcc warnings with gcc 8 (those warning class is disabled currently in jdk17 but enabled e.g. in jdk11). It would be good to fix them anyway which is done by this small change. Looks good to me. Thanks for fixing. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3970 From thartmann at openjdk.java.net Tue May 11 09:24:55 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 09:24:55 GMT Subject: RFR: 8266810: Move trivial Matcher code to cpu-specific header files In-Reply-To: References: Message-ID: On Mon, 10 May 2021 11:03:06 GMT, Claes Redestad wrote: > This patch moves a number of constants and trivial methods to newly introduced matcher_.hpp files. > > This enables constant folding and dead code elimination on one hand, and improved code navigation in IDEs on the other. > > The effect of this refactoring is modest: on Linux-x64 Hotspot (libjvm.so) shrinks by ~10Kb and C2 initialization cost drops from 8.5M to 8.3M. > > Testing: tier1-3, GHA builds of all architectures on linux Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3947 From thartmann at openjdk.java.net Tue May 11 09:28:53 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 09:28:53 GMT Subject: RFR: JDK-8266601: Fix bugs in AddLNode::Ideal transformations In-Reply-To: References: Message-ID: On Mon, 10 May 2021 19:33:17 GMT, John Tortugo wrote: > Some small fixes to the IR transformations performed in AddLNode::Ideal. > Tested (x86) Fastdebug build with JDK-tier1/2/3 and hotspot {tier1, compiler, serviceability, runtime, misc, tier1_compiler} running on Linux, macOS and Windows. Marked as reviewed by thartmann (Reviewer). Since the code is removed, JDK-8259624 should be closed as duplicate of this one then. ------------- PR: https://git.openjdk.java.net/jdk/pull/3955 From github.com+2249648+johntortugo at openjdk.java.net Tue May 11 09:31:55 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Tue, 11 May 2021 09:31:55 GMT Subject: Integrated: JDK-8266601: Fix bugs in AddLNode::Ideal transformations In-Reply-To: References: Message-ID: <0AR2HhooquP-990-KK2oejGtKgFwhA-vCY6CxoocOds=.f7650f83-bf88-47f9-b0a6-8306cd9e1c12@github.com> On Mon, 10 May 2021 19:33:17 GMT, John Tortugo wrote: > Some small fixes to the IR transformations performed in AddLNode::Ideal. > Tested (x86) Fastdebug build with JDK-tier1/2/3 and hotspot {tier1, compiler, serviceability, runtime, misc, tier1_compiler} running on Linux, macOS and Windows. This pull request has now been integrated. Changeset: 67cb22af Author: Cesar Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/67cb22af58c649e67f0b9f707a65389bcb39a205 Stats: 12 lines in 1 file changed: 0 ins; 10 del; 2 mod 8266601: Fix bugs in AddLNode::Ideal transformations Reviewed-by: kvn, vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3955 From thartmann at openjdk.java.net Tue May 11 09:47:57 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 09:47:57 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion In-Reply-To: References: Message-ID: On Tue, 11 May 2021 02:26:04 GMT, Yi Yang wrote: > C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. > > Candidates: NegateOp,Convert. > > Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: > > // run with -XX:+PrintValueNumbering > static int foo10(int t){ > int sum=12; > for(int i=0;i<100;i++){ > sum += 12; > sum += -t; > sum += (long)t; > } > return sum; > } > > Before: > > [...] > * loop invariant code motion for short loop B1 > processing block B1 > Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) > Instruction i10 is loop invariant > processing block B2 > Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) > substitution for 12 set to 5 > Instruction i12 is loop invariant // only 12 is recongized > Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) > Instruction i20 is loop invariant > ** loop successfully optimized > [...] > > After: > > [...] > ** loop invariant code motion for short loop B1 > processing block B1 > Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) > Instruction i10 is loop invariant > 6 0 i10 100 > processing block B2 > Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) > substitution for i12 set to 105 > Instruction i12 is loop invariant > 11 0 i12 12 > Instruction i14 is loop invariant > . 16 0 i14 -i4 > Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) > Instruction l17 is loop invariant > . 22 0 l17 i2l(i4) > Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) > Instruction i20 is loop invariant > 26 0 i20 1 > [...] Changes requested by thartmann (Reviewer). src/hotspot/share/c1/c1_ValueMap.cpp line 592: > 590: assert(!subst->has_subst(), "can't have a substitution"); > 591: > 592: TRACE_VALUE_NUMBERING(tty->print_cr("substitution for %c%d set to %c%d", instr->type()->tchar(), instr->id(), subst->id(), subst->type()->tchar())); Shouldn't the third argument be a char and the fourth an int? ------------- PR: https://git.openjdk.java.net/jdk/pull/3965 From thartmann at openjdk.java.net Tue May 11 09:48:55 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 09:48:55 GMT Subject: RFR: 8266874: Clean up C1 canonicalizer for TableSwitch/LookupSwitch In-Reply-To: References: Message-ID: <-G_C8KyjDpdO380HYhztHid2XXKMFRTmRHBkfNtbrg4=.6279438b-b306-4874-9ef9-e3acfb84de4f@github.com> On Tue, 11 May 2021 02:44:30 GMT, Yi Yang wrote: > Clean up C1 canonicalizer for TableSwitch/LookupSwitch Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3966 From yyang at openjdk.java.net Tue May 11 10:43:55 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 11 May 2021 10:43:55 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion [v2] In-Reply-To: References: Message-ID: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> > C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. > > Candidates: NegateOp,Convert. > > Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: > > // run with -XX:+PrintValueNumbering > static int foo10(int t){ > int sum=12; > for(int i=0;i<100;i++){ > sum += 12; > sum += -t; > sum += (long)t; > } > return sum; > } > > Before: > > [...] > * loop invariant code motion for short loop B1 > processing block B1 > Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) > Instruction i10 is loop invariant > processing block B2 > Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) > substitution for 12 set to 5 > Instruction i12 is loop invariant // only 12 is recongized > Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) > Instruction i20 is loop invariant > ** loop successfully optimized > [...] > > After: > > [...] > ** loop invariant code motion for short loop B1 > processing block B1 > Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) > Instruction i10 is loop invariant > 6 0 i10 100 > processing block B2 > Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) > substitution for i12 set to 105 > Instruction i12 is loop invariant > 11 0 i12 12 > Instruction i14 is loop invariant > . 16 0 i14 -i4 > Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) > Instruction l17 is loop invariant > . 22 0 l17 i2l(i4) > Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) > Instruction i20 is loop invariant > 26 0 i20 1 > [...] Yi Yang has updated the pull request incrementally with one additional commit since the last revision: tchar() before id() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3965/files - new: https://git.openjdk.java.net/jdk/pull/3965/files/5690f5cc..447dc3b3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3965&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3965&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3965.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3965/head:pull/3965 PR: https://git.openjdk.java.net/jdk/pull/3965 From yyang at openjdk.java.net Tue May 11 10:43:56 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 11 May 2021 10:43:56 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion [v2] In-Reply-To: References: Message-ID: On Tue, 11 May 2021 09:45:05 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> tchar() before id() > > src/hotspot/share/c1/c1_ValueMap.cpp line 592: > >> 590: assert(!subst->has_subst(), "can't have a substitution"); >> 591: >> 592: TRACE_VALUE_NUMBERING(tty->print_cr("substitution for %c%d set to %c%d", instr->type()->tchar(), instr->id(), subst->id(), subst->type()->tchar())); > > Shouldn't the third argument be a char and the fourth an int? Good catch, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/3965 From mbaesken at openjdk.java.net Tue May 11 10:52:58 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Tue, 11 May 2021 10:52:58 GMT Subject: Integrated: JDK-8266892: avoid maybe-uninitialized gcc warnings on linux s390x In-Reply-To: References: Message-ID: On Tue, 11 May 2021 08:56:05 GMT, Matthias Baesken wrote: > In the linux s390x hs code there are a few "maybe-uninitialized" gcc warnings with gcc 8 (those warning class is disabled currently in jdk17 but enabled e.g. in jdk11). It would be good to fix them anyway which is done by this small change. This pull request has now been integrated. Changeset: 9e6e2228 Author: Matthias Baesken URL: https://git.openjdk.java.net/jdk/commit/9e6e2228cba05ff2ee3a4014a0a92bdd08d016d9 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8266892: avoid maybe-uninitialized gcc warnings on linux s390x Reviewed-by: shade, lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/3970 From thartmann at openjdk.java.net Tue May 11 11:07:28 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 11:07:28 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion [v2] In-Reply-To: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> References: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> Message-ID: On Tue, 11 May 2021 10:43:55 GMT, Yi Yang wrote: >> C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. >> >> Candidates: NegateOp,Convert. >> >> Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: >> >> // run with -XX:+PrintValueNumbering >> static int foo10(int t){ >> int sum=12; >> for(int i=0;i<100;i++){ >> sum += 12; >> sum += -t; >> sum += (long)t; >> } >> return sum; >> } >> >> Before: >> >> [...] >> * loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for 12 set to 5 >> Instruction i12 is loop invariant // only 12 is recongized >> Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) >> Instruction i20 is loop invariant >> ** loop successfully optimized >> [...] >> >> After: >> >> [...] >> ** loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> 6 0 i10 100 >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for i12 set to 105 >> Instruction i12 is loop invariant >> 11 0 i12 12 >> Instruction i14 is loop invariant >> . 16 0 i14 -i4 >> Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) >> Instruction l17 is loop invariant >> . 22 0 l17 i2l(i4) >> Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) >> Instruction i20 is loop invariant >> 26 0 i20 1 >> [...] > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > tchar() before id() Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3965 From thartmann at openjdk.java.net Tue May 11 11:17:20 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 11:17:20 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v5] In-Reply-To: References: Message-ID: On Mon, 10 May 2021 05:33:35 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > remove c1 part I just gave this a quick run through testing and I'm seeing the following error with an internal test: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (.../open/src/hotspot/share/opto/escape.cpp:1112), pid=31841, tid=41475 # fatal error: EA unexpected CallLeaf SharedRuntime::trace_id_load_barrier Current CompileTask: C2: 710665 26269 4 jdk.jfr.internal.instrument.ThrowableTracer::traceError (70 bytes) Stack: [0x000070000d6e3000,0x000070000d7e3000], sp=0x000070000d7de810, free space=1006k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x10f463c] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6dc V [libjvm.dylib+0x10f4c4b] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b V [libjvm.dylib+0x5eea59] report_fatal(char const*, int, char const*, ...)+0x199 V [libjvm.dylib+0x6aaf3f] ConnectionGraph::process_call_arguments(CallNode*)+0x116f V [libjvm.dylib+0x69fa13] ConnectionGraph::compute_escape()+0xcb3 V [libjvm.dylib+0x69ec35] ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*)+0xd5 V [libjvm.dylib+0x5948f5] Compile::Optimize()+0x735 V [libjvm.dylib+0x592b8b] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x16fb V [libjvm.dylib+0x48534b] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10b V [libjvm.dylib+0x5af101] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x801 V [libjvm.dylib+0x5ae6e2] CompileBroker::compiler_thread_loop()+0x322 V [libjvm.dylib+0x1053b6d] JavaThread::thread_main_inner()+0x26d V [libjvm.dylib+0x1050a77] Thread::call_run()+0x177 V [libjvm.dylib+0xe198bf] thread_native_entry(Thread*)+0x14f C [libsystem_pthread.dylib+0x6954] _pthread_start+0xe0 C [libsystem_pthread.dylib+0x24a7] thread_start+0xf ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From yyang at openjdk.java.net Tue May 11 11:39:57 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 11 May 2021 11:39:57 GMT Subject: RFR: 8266874: Clean up C1 canonicalizer for TableSwitch/LookupSwitch In-Reply-To: References: Message-ID: On Tue, 11 May 2021 02:44:30 GMT, Yi Yang wrote: > Clean up C1 canonicalizer for TableSwitch/LookupSwitch Thank you Tobias for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/3966 From yyang at openjdk.java.net Tue May 11 11:39:57 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 11 May 2021 11:39:57 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion [v2] In-Reply-To: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> References: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> Message-ID: On Tue, 11 May 2021 10:43:55 GMT, Yi Yang wrote: >> C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. >> >> Candidates: NegateOp,Convert. >> >> Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: >> >> // run with -XX:+PrintValueNumbering >> static int foo10(int t){ >> int sum=12; >> for(int i=0;i<100;i++){ >> sum += 12; >> sum += -t; >> sum += (long)t; >> } >> return sum; >> } >> >> Before: >> >> [...] >> * loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for 12 set to 5 >> Instruction i12 is loop invariant // only 12 is recongized >> Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) >> Instruction i20 is loop invariant >> ** loop successfully optimized >> [...] >> >> After: >> >> [...] >> ** loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> 6 0 i10 100 >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for i12 set to 105 >> Instruction i12 is loop invariant >> 11 0 i12 12 >> Instruction i14 is loop invariant >> . 16 0 i14 -i4 >> Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) >> Instruction l17 is loop invariant >> . 22 0 l17 i2l(i4) >> Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) >> Instruction i20 is loop invariant >> 26 0 i20 1 >> [...] > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > tchar() before id() Thank you Tobias for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/3965 From vlivanov at openjdk.java.net Tue May 11 12:48:56 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 11 May 2021 12:48:56 GMT Subject: RFR: 8261158: JVMState should not be shared between SafePointNodes In-Reply-To: References: Message-ID: On Mon, 10 May 2021 14:09:17 GMT, Tobias Hartmann wrote: > We often, for example with loop strip mining, clone `SafePointNodes` without cloning their `JVMState`, leading to the same state being shared by different nodes. With Valhalla, we then hit asserts when aggressively scalarizing inline types in safepoints during IGVN because `debug_end()` (`_endoff`) of the attached `JVMState` is larger than `SafePointNode::_max`. That happens because the same `JVMState` has been modified during scalarizing in another `SafePointNode`, the details are described in https://github.com/openjdk/valhalla/pull/322. I don't think `JVMStates` should be shared between safepoint nodes and added an assert to catch this. > > The fix is to always shallow clone the `JVMState` when cloning a `SafepointNode`. Sometimes, a deep clone is required already by current code for `CallNodes` (see `CallNode::needs_clone_jvms`), I left that code as is. > > Thanks, > Tobias Looks good. src/hotspot/share/opto/callnode.hpp line 355: > 353: > 354: JVMState* jvms() const { return _jvms; } > 355: virtual bool needs_clone_jvms(Compile* C) { return false; } Considering `clone_jvms()` always clones associated JVMS now, `needs_clone_jvms()` becomes confusing. A variant that explicitly mentions deep copy is reqruired would be a better alternative. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3951 From vlivanov at openjdk.java.net Tue May 11 13:01:55 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 11 May 2021 13:01:55 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v2] In-Reply-To: <7mjYr1XHcHP2aQq3I9rvx8EorlaiISkO2tf3ePZ9ACQ=.a41f5461-a6cd-49eb-b128-ad232c15da0a@github.com> References: <7mjYr1XHcHP2aQq3I9rvx8EorlaiISkO2tf3ePZ9ACQ=.a41f5461-a6cd-49eb-b128-ad232c15da0a@github.com> Message-ID: On Tue, 11 May 2021 01:06:25 GMT, Sandhya Viswanathan wrote: >> LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: >> "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" >> >> Consider the following code snippet: >> ... >> set_control(_gvn.transform(new IfTrueNode(rc))); >> { >> PreserveJVMState pjvms(this); >> set_control(_gvn.transform(new IfFalseNode(rc))); >> uncommon_trap(Deoptimization::Reason_range_check, >> Deoptimization::Action_make_not_entrant); >> } >> .. >> Here the control is being modified by set_control even though a bailout is possible afterwards. >> Moving the set_control later in the intrinsic fixes this. >> >> This is a small fix. Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments from Vladimir Kozlov src/hotspot/share/opto/library_call.cpp line 1057: > 1055: } > 1056: > 1057: if (stopped()) { Moving the check around doesn't make much sense to me. `stopped() == false` signals that the current control is effectively dead. It could happen when the range check (`RangeCheckNode` ) always fails and execution unconditionally hits the uncommon trap. (I haven't double-checked myself whether it happens in practice or not.) By bailing out from the intrinsic (`return false;`), the next thing C2 will attempt is to inline `Preconditions::checkIndex(). There were no attempts to clean up the graph built up to this point (with the uncommon trap). Instead, the fix can just be `if (stopped()) { return true; }`. The graph constructed so far is valid. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From ddong at openjdk.java.net Tue May 11 13:08:14 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 11 May 2021 13:08:14 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v5] In-Reply-To: References: Message-ID: <_6eAEkL7urt87cu00_vfwv6Bg2RrN3nXuuG3uC1XY9g=.9b60ccdb-8fc2-443c-b158-8ded3ceab539@github.com> On Tue, 11 May 2021 11:13:35 GMT, Tobias Hartmann wrote: > I just gave this a quick run through testing and I'm seeing the following error with an internal test: > > ``` > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (.../open/src/hotspot/share/opto/escape.cpp:1112), pid=31841, tid=41475 > # fatal error: EA unexpected CallLeaf SharedRuntime::trace_id_load_barrier > ``` > > ``` > Current CompileTask: > C2: 710665 26269 4 jdk.jfr.internal.instrument.ThrowableTracer::traceError (70 bytes) > > Stack: [0x000070000d6e3000,0x000070000d7e3000], sp=0x000070000d7de810, free space=1006k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.dylib+0x10f463c] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6dc > V [libjvm.dylib+0x10f4c4b] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b > V [libjvm.dylib+0x5eea59] report_fatal(char const*, int, char const*, ...)+0x199 > V [libjvm.dylib+0x6aaf3f] ConnectionGraph::process_call_arguments(CallNode*)+0x116f > V [libjvm.dylib+0x69fa13] ConnectionGraph::compute_escape()+0xcb3 > V [libjvm.dylib+0x69ec35] ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*)+0xd5 > V [libjvm.dylib+0x5948f5] Compile::Optimize()+0x735 > V [libjvm.dylib+0x592b8b] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x16fb > V [libjvm.dylib+0x48534b] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10b > V [libjvm.dylib+0x5af101] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x801 > V [libjvm.dylib+0x5ae6e2] CompileBroker::compiler_thread_loop()+0x322 > V [libjvm.dylib+0x1053b6d] JavaThread::thread_main_inner()+0x26d > V [libjvm.dylib+0x1050a77] Thread::call_run()+0x177 > V [libjvm.dylib+0xe198bf] thread_native_entry(Thread*)+0x14f > C [libsystem_pthread.dylib+0x6954] _pthread_start+0xe0 > C [libsystem_pthread.dylib+0x24a7] thread_start+0xf > ``` Hi Tobias, I cannot reproduce this error in my environment. Could you provide some more detailed information (e.g. JVM options) on this test? Denghui ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From thartmann at openjdk.java.net Tue May 11 13:21:33 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 13:21:33 GMT Subject: RFR: 8261158: JVMState should not be shared between SafePointNodes [v2] In-Reply-To: References: Message-ID: > We often, for example with loop strip mining, clone `SafePointNodes` without cloning their `JVMState`, leading to the same state being shared by different nodes. With Valhalla, we then hit asserts when aggressively scalarizing inline types in safepoints during IGVN because `debug_end()` (`_endoff`) of the attached `JVMState` is larger than `SafePointNode::_max`. That happens because the same `JVMState` has been modified during scalarizing in another `SafePointNode`, the details are described in https://github.com/openjdk/valhalla/pull/322. I don't think `JVMStates` should be shared between safepoint nodes and added an assert to catch this. > > The fix is to always shallow clone the `JVMState` when cloning a `SafepointNode`. Sometimes, a deep clone is required already by current code for `CallNodes` (see `CallNode::needs_clone_jvms`), I left that code as is. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Rename needs_clone_jvms -> needs_deep_clone_jvms ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3951/files - new: https://git.openjdk.java.net/jdk/pull/3951/files/03878127..74a29a0f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3951&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3951&range=00-01 Stats: 25 lines in 6 files changed: 0 ins; 0 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/3951.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3951/head:pull/3951 PR: https://git.openjdk.java.net/jdk/pull/3951 From thartmann at openjdk.java.net Tue May 11 13:21:36 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 11 May 2021 13:21:36 GMT Subject: RFR: 8261158: JVMState should not be shared between SafePointNodes [v2] In-Reply-To: References: Message-ID: On Tue, 11 May 2021 12:44:15 GMT, Vladimir Ivanov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename needs_clone_jvms -> needs_deep_clone_jvms > > src/hotspot/share/opto/callnode.hpp line 355: > >> 353: >> 354: JVMState* jvms() const { return _jvms; } >> 355: virtual bool needs_clone_jvms(Compile* C) { return false; } > > Considering `clone_jvms()` always clones associated JVMS now, `needs_clone_jvms()` becomes confusing. > A variant that explicitly mentions deep copy is reqruired would be a better alternative. Thanks Vladimir. I've just updated the PR, what do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/3951 From psandoz at openjdk.java.net Tue May 11 15:22:52 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 11 May 2021 15:22:52 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: <_PK73SjQ7khqH0-Wd5aH_e1WQMxhTXA29X1zKLGCCpE=.3bb61289-4f86-4926-a214-998af2907f39@github.com> On Fri, 7 May 2021 14:23:38 GMT, Jatin Bhateja wrote: > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) Java code looks good. Perhaps when we add bit-counting operations to vector we might find a way to consolidate. I don't wanna block progress based on something we might do in the future. test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java line 109: > 107: public void testTrueCountByte(Blackhole bh) { > 108: bh.consume(bmask.trueCount()); > 109: } No need to use a black hole. A returned value will be consumed by a black hole by the framework. Suggestion: @Benchmark public int testTrueCountByte() { return bmask.trueCount(); } ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3916 From psandoz at openjdk.java.net Tue May 11 15:22:53 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 11 May 2021 15:22:53 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 7 May 2021 17:45:26 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java line 147: >> >>> 145: >>> 146: /*package-private*/ >>> 147: static int trueCountHelper(boolean[] bits) { >> >> Naming-wise i think you can drop `Helper` from such methods. > > This is indeed a Helper routine called from the lambda expression. Although we don't use that naming pattern in other places for the fallback Java code. It's just the scalar implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From dnsimon at openjdk.java.net Tue May 11 15:55:17 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 11 May 2021 15:55:17 GMT Subject: RFR: 8266923: [JVMCI] expose StackOverflow::_stack_overflow_limit to JVMCI Message-ID: This PR exposes the `StackOverflow::_stack_overflow_limit` field to JVMCI Java code for use in Graal. ------------- Commit messages: - expose StackOverflow::_stack_overflow_limit to JVMCI Changes: https://git.openjdk.java.net/jdk/pull/3982/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3982&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266923 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3982.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3982/head:pull/3982 PR: https://git.openjdk.java.net/jdk/pull/3982 From vlivanov at openjdk.java.net Tue May 11 16:59:57 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 11 May 2021 16:59:57 GMT Subject: RFR: 8261158: JVMState should not be shared between SafePointNodes [v2] In-Reply-To: References: Message-ID: On Tue, 11 May 2021 13:21:33 GMT, Tobias Hartmann wrote: >> We often, for example with loop strip mining, clone `SafePointNodes` without cloning their `JVMState`, leading to the same state being shared by different nodes. With Valhalla, we then hit asserts when aggressively scalarizing inline types in safepoints during IGVN because `debug_end()` (`_endoff`) of the attached `JVMState` is larger than `SafePointNode::_max`. That happens because the same `JVMState` has been modified during scalarizing in another `SafePointNode`, the details are described in https://github.com/openjdk/valhalla/pull/322. I don't think `JVMStates` should be shared between safepoint nodes and added an assert to catch this. >> >> The fix is to always shallow clone the `JVMState` when cloning a `SafepointNode`. Sometimes, a deep clone is required already by current code for `CallNodes` (see `CallNode::needs_clone_jvms`), I left that code as is. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Rename needs_clone_jvms -> needs_deep_clone_jvms Much better! Thanks. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3951 From sviswanathan at openjdk.java.net Tue May 11 18:20:24 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 11 May 2021 18:20:24 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v2] In-Reply-To: References: <7mjYr1XHcHP2aQq3I9rvx8EorlaiISkO2tf3ePZ9ACQ=.a41f5461-a6cd-49eb-b128-ad232c15da0a@github.com> Message-ID: On Tue, 11 May 2021 12:59:00 GMT, Vladimir Ivanov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Implement review comments from Vladimir Kozlov > > src/hotspot/share/opto/library_call.cpp line 1057: > >> 1055: } >> 1056: >> 1057: if (stopped()) { > > Moving the check around doesn't make much sense to me. > > `stopped() == false` signals that the current control is effectively dead. It could happen when the range check (`RangeCheckNode` ) always fails and execution unconditionally hits the uncommon trap. (I haven't double-checked myself whether it happens in practice or not.) > > By bailing out from the intrinsic (`return false;`), the next thing C2 will attempt is to inline `Preconditions::checkIndex(). There were no attempts to clean up the graph built up to this point (with the uncommon trap). > > Instead, the fix can just be `if (stopped()) { return true; }`. The graph constructed so far is valid. 'if (stopped()) { return true; }' (i.e. only changing the return false to return true at original line 1058) also fixes the issue. It is something to do with range check fails. I discovered this issue while working on vector api. I had a call to Objects.checkIndex(origin, length()) and came across the 'assert(ctrl == kit.control()) failed'. As part of code review we found that the call should have been Objects.checkIndex(origin, length()+1) so it looks like something to do with range check fails. I am unable to create a stand alone test case. Let me know if we should go ahead with the 'if (stopped()) { return true; }' fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From mandy.chung at oracle.com Tue May 11 20:42:01 2021 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 11 May 2021 13:42:01 -0700 Subject: Draft JEP: Reimplement Core Reflection on Method Handles Message-ID: This draft JEP is a proposal to reimplement core reflection on top of method handles: ?? https://bugs.openjdk.java.net/browse/JDK-8266010 Feedback is welcome.? The prototype is at [1]. Mandy [1] https://github.com/mlchung/jdk/tree/reimplement-method-invoke From brian.goetz at oracle.com Tue May 11 21:01:57 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 11 May 2021 17:01:57 -0400 Subject: Draft JEP: Reimplement Core Reflection on Method Handles In-Reply-To: References: Message-ID: <0dcdae1f-5d76-3c8e-eaf8-ce3b73cf7de4@oracle.com> Yes, please! To add to the list of motivations/things to remove: the current implementation relies on the special `MagicAccessorImpl` to relax accessibility.? The notes in this class are frightening; getting rid of it would be a mercy. On 5/11/2021 4:42 PM, Mandy Chung wrote: > This draft JEP is a proposal to reimplement core reflection on top of > method handles: > ?? https://bugs.openjdk.java.net/browse/JDK-8266010 > > Feedback is welcome.? The prototype is at [1]. > > Mandy > [1] https://github.com/mlchung/jdk/tree/reimplement-method-invoke From kvn at openjdk.java.net Tue May 11 21:52:52 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 11 May 2021 21:52:52 GMT Subject: RFR: 8266074: Vtable-based CHA implementation [v4] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 18:45:07 GMT, Vladimir Ivanov wrote: >> As of now, Class Hierarchy Analysis (CHA) employs an approximate algorithm to enumerate all non-abstract methods in a class hierarchy. >> >> It served quite well for many years, but it accumulated significant complexity >> to support different corner cases over time and inevitable evolution of the JVM >> stretched the whole approach way too much (to the point where it become almost >> impossible to extend the analysis any further). >> >> It turns out the root problem is the decision to reimplement method resolution >> and method selection logic from scratch and to perform it on JVM internal >> representation. It makes it very hard to reason about correctness and the >> implementation becomes sensitive to changes in internal representation. >> >> So, the main motivation for the redesign is twofold: >> * reduce maintenance burden and increase confidence in the code; >> * unlock some long-awaited enhancements. >> >> Though I did experiment with relaxing existing constraints (e.g., enable default method support), >> any possible enhancements are deliberately kept out of scope for the current PR. >> (It does deliver a bit of minor enhancements front as the changes in >> compiler/cha/StrengthReduceInterfaceCall.java manifest, but it's a side effect >> of the other changes and was not the goal of the current work.) >> >> Proposed implementation (`LinkedConcreteMethodFinder`) mimics method invocation >> and relies on vtable/itable information to detect target method for every >> subclass it visits. It removes all the complexity associated with method >> resolution and method selection logic and leaves only essential logic to prepare for method selection. >> >> Vtables are filled during class linkage, so new logic doesn't work on not yet linked classed. >> Instead of supporting not yet linked case, it is simply ignored. It is safe to >> skip them (treat as "effectively non-concrete") since it is guaranteed there >> are no instances created yet. But it requires VM to check dependencies once a >> class is linked. >> >> I ended up with 2 separate dependency validation passes (when class is loaded >> and when it is linked). To avoid duplicated work, only dependencies >> which may be affected by class initialization state change >> (`unique_concrete_method_4`) are visited. >> >> (I experimented with merging passes into a single pass (delay the pass until >> linkage is over), but it severely affected other class-related dependencies and >> relevant optimizations.code.) >> >> Compiler Interface (CI) is changed to require users to provide complete information about the call site being analyzed. >> >> Old implementation is kept intact for now (will be removed later) to: >> - JVMCI hasn't been migrated to the new implementation yet; >> - enable verification that 2 implementations (old and new) agree on the results; >> - temporarily keep an option to revert to the original implementation in case any regressions show up. >> >> Testing: >> - [x] hs-tier1 - hs-tier9 >> - [x] hs-tier1 - hs-tier4 w/ `-XX:-UseVtableBasedCHA` >> - [x] performance testing >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Cover abstract method case Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3727 From redestad at openjdk.java.net Tue May 11 22:07:54 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 11 May 2021 22:07:54 GMT Subject: RFR: 8266937: Remove Compile::reshape_address Message-ID: This method was introduced for aarch64 only in JDK-8154826 but the implementation was intentionally removed by JDK-8204348 The declarations and empty methods were left behind on all platforms, though. ------------- Commit messages: - Remove Compile::reshape_address Changes: https://git.openjdk.java.net/jdk/pull/3988/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3988&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266937 Stats: 20 lines in 7 files changed: 0 ins; 20 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3988.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3988/head:pull/3988 PR: https://git.openjdk.java.net/jdk/pull/3988 From forax at univ-mlv.fr Tue May 11 22:14:27 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 12 May 2021 00:14:27 +0200 (CEST) Subject: Draft JEP: Reimplement Core Reflection on Method Handles In-Reply-To: References: Message-ID: <1529811052.1759622.1620771267201.JavaMail.zimbra@u-pem.fr> Hi Mandy, impressive work ! I think that the method that are a caller-sensitive adapter (the one that takes a supplementary class as last parameter) should be annotated with a specific JDK internal annotation, so the link between the caller sensitive method and it's adapter is obvious for the humans that read the code. Otherwise, i've only taken a look to the parts of the code that are using ASM. This line is weird, it uses 52 which is Java 8 https://github.com/mlchung/jdk/commit/320efd2e5697627243f6fe058485fb8708a0cd41#diff-4e4fca8bb2eb6320ff485ee724248e1641b4bb3f6dbae8526e87c5cf15905d9aR1262 Perhaps all versions should be updated to 61 (Java 17), unit the internal version of ASM is refreshed so the constant V17 can be used. R?mi ----- Mail original ----- > De: "mandy chung" > ?: "core-libs-dev" , "hotspot compiler" > Envoy?: Mardi 11 Mai 2021 22:42:01 > Objet: Draft JEP: Reimplement Core Reflection on Method Handles > This draft JEP is a proposal to reimplement core reflection on top of > method handles: > ?? https://bugs.openjdk.java.net/browse/JDK-8266010 > > Feedback is welcome.? The prototype is at [1]. > > Mandy > [1] https://github.com/mlchung/jdk/tree/reimplement-method-invoke From mandy.chung at oracle.com Tue May 11 22:36:37 2021 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 11 May 2021 15:36:37 -0700 Subject: Draft JEP: Reimplement Core Reflection on Method Handles In-Reply-To: <0dcdae1f-5d76-3c8e-eaf8-ce3b73cf7de4@oracle.com> References: <0dcdae1f-5d76-3c8e-eaf8-ce3b73cf7de4@oracle.com> Message-ID: <5a8859f2-5bcd-e5ec-cf82-bd530c9b7b52@oracle.com> On 5/11/21 2:01 PM, Brian Goetz wrote: > Yes, please! > > To add to the list of motivations/things to remove: the current > implementation relies on the special `MagicAccessorImpl` to relax > accessibility.? The notes in this class are frightening; getting rid > of it would be a mercy. > > Thanks, great point. I will add that. Mandy > > On 5/11/2021 4:42 PM, Mandy Chung wrote: >> This draft JEP is a proposal to reimplement core reflection on top of >> method handles: >> https://bugs.openjdk.java.net/browse/JDK-8266010 >> >> Feedback is welcome.? The prototype is at [1]. >> >> Mandy >> [1] https://github.com/mlchung/jdk/tree/reimplement-method-invoke > From mandy.chung at oracle.com Tue May 11 22:39:56 2021 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 11 May 2021 15:39:56 -0700 Subject: Draft JEP: Reimplement Core Reflection on Method Handles In-Reply-To: <1529811052.1759622.1620771267201.JavaMail.zimbra@u-pem.fr> References: <1529811052.1759622.1620771267201.JavaMail.zimbra@u-pem.fr> Message-ID: <92148095-eef0-d6a5-a3d5-dfc9c0548c8e@oracle.com> On 5/11/21 3:14 PM, Remi Forax wrote: > Hi Mandy, > impressive work ! > > I think that the method that are a caller-sensitive adapter (the one that takes a supplementary class as last parameter) should be annotated with a specific JDK internal annotation, > so the link between the caller sensitive method and it's adapter is obvious for the humans that read the code. This is exactly what I am considering.? May get to it soon. > Otherwise, i've only taken a look to the parts of the code that are using ASM. > > This line is weird, it uses 52 which is Java 8 > https://urldefense.com/v3/__https://github.com/mlchung/jdk/commit/320efd2e5697627243f6fe058485fb8708a0cd41*diff-4e4fca8bb2eb6320ff485ee724248e1641b4bb3f6dbae8526e87c5cf15905d9aR1262__;Iw!!GqivPVa7Brio!LiAwZ1SmPNk8ETBmjRBbMoL7c2XB4M4N29I5lhAULQTtwDxEd6B1ERlcn8PyovcSaQ$ > > Perhaps all versions should be updated to 61 (Java 17), unit the internal version of ASM is refreshed so the constant V17 can be used. Agree.? I will fix that. Mandy > > R?mi > > ----- Mail original ----- >> De: "mandy chung" >> ?: "core-libs-dev" , "hotspot compiler" >> Envoy?: Mardi 11 Mai 2021 22:42:01 >> Objet: Draft JEP: Reimplement Core Reflection on Method Handles >> This draft JEP is a proposal to reimplement core reflection on top of >> method handles: >> ?? https://bugs.openjdk.java.net/browse/JDK-8266010 >> >> Feedback is welcome.? The prototype is at [1]. >> >> Mandy >> [1] https://urldefense.com/v3/__https://github.com/mlchung/jdk/tree/reimplement-method-invoke__;!!GqivPVa7Brio!LiAwZ1SmPNk8ETBmjRBbMoL7c2XB4M4N29I5lhAULQTtwDxEd6B1ERlcn8MsCgr-dg$ From kvn at openjdk.java.net Tue May 11 22:39:55 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 11 May 2021 22:39:55 GMT Subject: RFR: 8266937: Remove Compile::reshape_address In-Reply-To: References: Message-ID: On Tue, 11 May 2021 21:58:45 GMT, Claes Redestad wrote: > This method was introduced for aarch64 only in JDK-8154826 but the implementation was intentionally removed by JDK-8204348 > > The declarations and empty methods were left behind on all platforms, though. Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3988 From kvn at openjdk.java.net Tue May 11 22:42:52 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 11 May 2021 22:42:52 GMT Subject: RFR: 8266923: [JVMCI] expose StackOverflow::_stack_overflow_limit to JVMCI In-Reply-To: References: Message-ID: On Tue, 11 May 2021 15:47:08 GMT, Doug Simon wrote: > This PR exposes the `StackOverflow::_stack_overflow_limit` field to JVMCI Java code for use in Graal. Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3982 From redestad at openjdk.java.net Tue May 11 22:49:27 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 11 May 2021 22:49:27 GMT Subject: RFR: 8266937: Remove Compile::reshape_address In-Reply-To: References: Message-ID: On Tue, 11 May 2021 22:36:32 GMT, Vladimir Kozlov wrote: > Trivial. Yep. Thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/3988 From redestad at openjdk.java.net Tue May 11 22:49:28 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 11 May 2021 22:49:28 GMT Subject: Integrated: 8266937: Remove Compile::reshape_address In-Reply-To: References: Message-ID: On Tue, 11 May 2021 21:58:45 GMT, Claes Redestad wrote: > This method was introduced for aarch64 only in JDK-8154826 but the implementation was intentionally removed by JDK-8204348 > > The declarations and empty methods were left behind on all platforms, though. This pull request has now been integrated. Changeset: 616244f4 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/616244f43aa68543e20f1eefedd67ca8c81669e1 Stats: 20 lines in 7 files changed: 0 ins; 20 del; 0 mod 8266937: Remove Compile::reshape_address Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3988 From kvn at openjdk.java.net Tue May 11 22:54:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 11 May 2021 22:54:56 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v2] In-Reply-To: References: <7mjYr1XHcHP2aQq3I9rvx8EorlaiISkO2tf3ePZ9ACQ=.a41f5461-a6cd-49eb-b128-ad232c15da0a@github.com> Message-ID: <5MOboTrs3a5BL1IHGWITmB0Ynb1EQ0HzsFy59RgkHjQ=.c0407d38-6693-41f7-b704-4a297480fa41@github.com> On Tue, 11 May 2021 18:16:52 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/library_call.cpp line 1057: >> >>> 1055: } >>> 1056: >>> 1057: if (stopped()) { >> >> Moving the check around doesn't make much sense to me. >> >> `stopped() == false` signals that the current control is effectively dead. It could happen when the range check (`RangeCheckNode` ) always fails and execution unconditionally hits the uncommon trap. (I haven't double-checked myself whether it happens in practice or not.) >> >> By bailing out from the intrinsic (`return false;`), the next thing C2 will attempt is to inline `Preconditions::checkIndex(). There were no attempts to clean up the graph built up to this point (with the uncommon trap). >> >> Instead, the fix can just be `if (stopped()) { return true; }`. The graph constructed so far is valid. > > 'if (stopped()) { return true; }' (i.e. only changing the return false to return true at original line 1058) also fixes the issue. > It is something to do with range check fails. > > I discovered this issue while working on vector api. I had a call to Objects.checkIndex(origin, length()) and came across the 'assert(ctrl == kit.control()) failed'. As part of code review we found that the call should have been Objects.checkIndex(origin, length()+1) so it looks like something to do with range check fails. I am unable to create a stand alone test case. > > Let me know if we should go ahead with the 'if (stopped()) { return true; }' fix. Yes, returning `true` could be correct because of constant folding we end up in uncommon trap which is valid for provided values (constants). It will be the same if code is compiled normally. But we also need to move the other `stopped()` check from line 1036 to 1028 after BuildCutout which may also can go to uncommon trap (when length is know negative during compilation). And it also should return `true`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From kvn at openjdk.java.net Tue May 11 23:17:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 11 May 2021 23:17:07 GMT Subject: RFR: 8261158: JVMState should not be shared between SafePointNodes [v2] In-Reply-To: References: Message-ID: <-NO6RpvlaxPN4BYrQm0vrO6jZQw6x0EH6RYXJWjkWd0=.7d514827-548e-45a2-b7ca-4ca2ff46e849@github.com> On Tue, 11 May 2021 13:21:33 GMT, Tobias Hartmann wrote: >> We often, for example with loop strip mining, clone `SafePointNodes` without cloning their `JVMState`, leading to the same state being shared by different nodes. With Valhalla, we then hit asserts when aggressively scalarizing inline types in safepoints during IGVN because `debug_end()` (`_endoff`) of the attached `JVMState` is larger than `SafePointNode::_max`. That happens because the same `JVMState` has been modified during scalarizing in another `SafePointNode`, the details are described in https://github.com/openjdk/valhalla/pull/322. I don't think `JVMStates` should be shared between safepoint nodes and added an assert to catch this. >> >> The fix is to always shallow clone the `JVMState` when cloning a `SafepointNode`. Sometimes, a deep clone is required already by current code for `CallNodes` (see `CallNode::needs_clone_jvms`), I left that code as is. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Rename needs_clone_jvms -> needs_deep_clone_jvms Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3951 From thartmann at openjdk.java.net Wed May 12 07:25:21 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 12 May 2021 07:25:21 GMT Subject: RFR: 8261158: JVMState should not be shared between SafePointNodes [v2] In-Reply-To: References: Message-ID: On Tue, 11 May 2021 16:57:22 GMT, Vladimir Ivanov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename needs_clone_jvms -> needs_deep_clone_jvms > > Much better! Thanks. Thanks for the reviews, @iwanowww and @vnkozlov! ------------- PR: https://git.openjdk.java.net/jdk/pull/3951 From thartmann at openjdk.java.net Wed May 12 07:25:22 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 12 May 2021 07:25:22 GMT Subject: Integrated: 8261158: JVMState should not be shared between SafePointNodes In-Reply-To: References: Message-ID: On Mon, 10 May 2021 14:09:17 GMT, Tobias Hartmann wrote: > We often, for example with loop strip mining, clone `SafePointNodes` without cloning their `JVMState`, leading to the same state being shared by different nodes. With Valhalla, we then hit asserts when aggressively scalarizing inline types in safepoints during IGVN because `debug_end()` (`_endoff`) of the attached `JVMState` is larger than `SafePointNode::_max`. That happens because the same `JVMState` has been modified during scalarizing in another `SafePointNode`, the details are described in https://github.com/openjdk/valhalla/pull/322. I don't think `JVMStates` should be shared between safepoint nodes and added an assert to catch this. > > The fix is to always shallow clone the `JVMState` when cloning a `SafepointNode`. Sometimes, a deep clone is required already by current code for `CallNodes` (see `CallNode::needs_clone_jvms`), I left that code as is. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 06d76028 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/06d760283344a1d0fd510aed306e0efb76b51617 Stats: 47 lines in 8 files changed: 16 ins; 8 del; 23 mod 8261158: JVMState should not be shared between SafePointNodes Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3951 From thartmann at openjdk.java.net Wed May 12 07:29:19 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 12 May 2021 07:29:19 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v5] In-Reply-To: References: Message-ID: <5ax-PzsNOkbq_9_TZZcksCmyDyyAk9YTNS_8qlQQq7Y=.241ab2e2-5580-405b-b93a-71f44f517a50@github.com> On Mon, 10 May 2021 05:33:35 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > remove c1 part Hi Denghui, I've attached the corresponding hs_err and replay files to the bug. Hope that helps! Tobias ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From thartmann at openjdk.java.net Wed May 12 08:37:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 12 May 2021 08:37:06 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion [v2] In-Reply-To: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> References: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> Message-ID: On Tue, 11 May 2021 10:43:55 GMT, Yi Yang wrote: >> C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. >> >> Candidates: NegateOp,Convert. >> >> Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: >> >> // run with -XX:+PrintValueNumbering >> static int foo10(int t){ >> int sum=12; >> for(int i=0;i<100;i++){ >> sum += 12; >> sum += -t; >> sum += (long)t; >> } >> return sum; >> } >> >> Before: >> >> [...] >> * loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for 12 set to 5 >> Instruction i12 is loop invariant // only 12 is recongized >> Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) >> Instruction i20 is loop invariant >> ** loop successfully optimized >> [...] >> >> After: >> >> [...] >> ** loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> 6 0 i10 100 >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for i12 set to 105 >> Instruction i12 is loop invariant >> 11 0 i12 12 >> Instruction i14 is loop invariant >> . 16 0 i14 -i4 >> Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) >> Instruction l17 is loop invariant >> . 22 0 l17 i2l(i4) >> Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) >> Instruction i20 is loop invariant >> 26 0 i20 1 >> [...] > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > tchar() before id() This is non-trivial and requires a second review. ------------- PR: https://git.openjdk.java.net/jdk/pull/3965 From yyang at openjdk.java.net Wed May 12 08:39:10 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 12 May 2021 08:39:10 GMT Subject: Integrated: 8266189: Remove C1 "IfInstanceOf" instruction In-Reply-To: References: Message-ID: <8lggQ9n2pypr0D3jQG3gjLfWzpeOSru4uu9MbZRvVKQ=.5650b05e-0df9-4aa6-8da0-2ad6032a01b2@github.com> On Sat, 8 May 2021 10:54:40 GMT, Yi Yang wrote: > Remove IfInstanceOf instruction, it has been there for a long while(13yrs) and not implemented yet. This pull request has now been integrated. Changeset: 548899d4 Author: Yi Yang Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/548899d40e10728cef2f9e5fa2e2f2b51a37ae35 Stats: 93 lines in 10 files changed: 0 ins; 93 del; 0 mod 8266189: Remove C1 "IfInstanceOf" instruction Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3935 From yyang at openjdk.java.net Wed May 12 08:39:27 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 12 May 2021 08:39:27 GMT Subject: Integrated: 8266874: Clean up C1 canonicalizer for TableSwitch/LookupSwitch In-Reply-To: References: Message-ID: On Tue, 11 May 2021 02:44:30 GMT, Yi Yang wrote: > Clean up C1 canonicalizer for TableSwitch/LookupSwitch This pull request has now been integrated. Changeset: b46086d7 Author: Yi Yang Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/b46086d777d6d051b6c599e040706efcd66d422c Stats: 34 lines in 1 file changed: 0 ins; 34 del; 0 mod 8266874: Clean up C1 canonicalizer for TableSwitch/LookupSwitch Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3966 From dnsimon at openjdk.java.net Wed May 12 08:50:49 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 12 May 2021 08:50:49 GMT Subject: Integrated: 8266923: [JVMCI] expose StackOverflow::_stack_overflow_limit to JVMCI In-Reply-To: References: Message-ID: On Tue, 11 May 2021 15:47:08 GMT, Doug Simon wrote: > This PR exposes the `StackOverflow::_stack_overflow_limit` field to JVMCI Java code for use in Graal. This pull request has now been integrated. Changeset: f3b510b9 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/f3b510b9aa540ae5fbda687d545e995c5622f971 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8266923: [JVMCI] expose StackOverflow::_stack_overflow_limit to JVMCI Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3982 From neliasso at openjdk.java.net Wed May 12 08:57:21 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 12 May 2021 08:57:21 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion [v2] In-Reply-To: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> References: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> Message-ID: On Tue, 11 May 2021 10:43:55 GMT, Yi Yang wrote: >> C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. >> >> Candidates: NegateOp,Convert. >> >> Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: >> >> // run with -XX:+PrintValueNumbering >> static int foo10(int t){ >> int sum=12; >> for(int i=0;i<100;i++){ >> sum += 12; >> sum += -t; >> sum += (long)t; >> } >> return sum; >> } >> >> Before: >> >> [...] >> * loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for 12 set to 5 >> Instruction i12 is loop invariant // only 12 is recongized >> Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) >> Instruction i20 is loop invariant >> ** loop successfully optimized >> [...] >> >> After: >> >> [...] >> ** loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> 6 0 i10 100 >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for i12 set to 105 >> Instruction i12 is loop invariant >> 11 0 i12 12 >> Instruction i14 is loop invariant >> . 16 0 i14 -i4 >> Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) >> Instruction l17 is loop invariant >> . 22 0 l17 i2l(i4) >> Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) >> Instruction i20 is loop invariant >> 26 0 i20 1 >> [...] > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > tchar() before id() Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3965 From yyang at openjdk.java.net Wed May 12 09:10:57 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 12 May 2021 09:10:57 GMT Subject: Integrated: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion In-Reply-To: References: Message-ID: On Tue, 11 May 2021 02:26:04 GMT, Yi Yang wrote: > C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. > > Candidates: NegateOp,Convert. > > Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: > > // run with -XX:+PrintValueNumbering > static int foo10(int t){ > int sum=12; > for(int i=0;i<100;i++){ > sum += 12; > sum += -t; > sum += (long)t; > } > return sum; > } > > Before: > > [...] > * loop invariant code motion for short loop B1 > processing block B1 > Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) > Instruction i10 is loop invariant > processing block B2 > Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) > substitution for 12 set to 5 > Instruction i12 is loop invariant // only 12 is recongized > Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) > Instruction i20 is loop invariant > ** loop successfully optimized > [...] > > After: > > [...] > ** loop invariant code motion for short loop B1 > processing block B1 > Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) > Instruction i10 is loop invariant > 6 0 i10 100 > processing block B2 > Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) > substitution for i12 set to 105 > Instruction i12 is loop invariant > 11 0 i12 12 > Instruction i14 is loop invariant > . 16 0 i14 -i4 > Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) > Instruction l17 is loop invariant > . 22 0 l17 i2l(i4) > Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) > Instruction i20 is loop invariant > 26 0 i20 1 > [...] This pull request has now been integrated. Changeset: 11759bfb Author: Yi Yang Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/11759bfb2d8e0208ad56f9ad5a425067e66c2bc0 Stats: 11 lines in 1 file changed: 7 ins; 3 del; 1 mod 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion Reviewed-by: thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/3965 From yyang at openjdk.java.net Wed May 12 09:53:53 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 12 May 2021 09:53:53 GMT Subject: RFR: 8266798: C1: More types of instruction can also apply LoopInvariantCodeMotion [v2] In-Reply-To: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> References: <761bky-U_daJtCpRxhPe8WeRHZAjRMwlRQn0ARdLINY=.29aaae5c-cc3b-4624-9725-69911bf9f612@github.com> Message-ID: <1yZAg3-fAnF396huESs4dsih3UCSuvYDDO1PKf4f_1s=.90e56178-83ca-4308-b464-8e9744602006@github.com> On Tue, 11 May 2021 10:43:55 GMT, Yi Yang wrote: >> C1 only applies LoopInvariantCodeMotion for instructions whose types are Constant/ArithmeticOp/LoadField/ArrayLength/LoadIndexed. We are possible to apply this optimization for more types of instruction. >> >> Candidates: NegateOp,Convert. >> >> Due to the lack of verification at IR level, it is difficult to write jtreg to check if it transformed, so I can only demonstrate it with a simple program: >> >> // run with -XX:+PrintValueNumbering >> static int foo10(int t){ >> int sum=12; >> for(int i=0;i<100;i++){ >> sum += 12; >> sum += -t; >> sum += (long)t; >> } >> return sum; >> } >> >> Before: >> >> [...] >> * loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for 12 set to 5 >> Instruction i12 is loop invariant // only 12 is recongized >> Value Numbering: insert Constant i20 (size 11, entries 4, nesting 2) >> Instruction i20 is loop invariant >> ** loop successfully optimized >> [...] >> >> After: >> >> [...] >> ** loop invariant code motion for short loop B1 >> processing block B1 >> Value Numbering: insert Constant i10 (size 11, entries 3, nesting 2) >> Instruction i10 is loop invariant >> 6 0 i10 100 >> processing block B2 >> Value Numbering: Constant i12 equal to i5 (size 11, entries 3, nesting-diff 1) >> substitution for i12 set to 105 >> Instruction i12 is loop invariant >> 11 0 i12 12 >> Instruction i14 is loop invariant >> . 16 0 i14 -i4 >> Value Numbering: insert Convert l17 (size 11, entries 4, nesting 2) >> Instruction l17 is loop invariant >> . 22 0 l17 i2l(i4) >> Value Numbering: insert Constant i20 (size 11, entries 5, nesting 2) >> Instruction i20 is loop invariant >> 26 0 i20 1 >> [...] > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > tchar() before id() Thank you Tobias and Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/3965 From vlivanov at openjdk.java.net Wed May 12 13:42:10 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 12 May 2021 13:42:10 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses Message-ID: Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. Found 3 occurrences: - `Dependencies::find_finalizable_subclass()` - `reinitialize_vtable_of()` - `VM_RedefineClasses::increment_class_counter()` Testing: - [x] hs-tier1 - hs-tier4 ------------- Commit messages: - 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses Changes: https://git.openjdk.java.net/jdk/pull/3995/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3995&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266973 Stats: 55 lines in 7 files changed: 7 ins; 19 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/3995.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3995/head:pull/3995 PR: https://git.openjdk.java.net/jdk/pull/3995 From kvn at openjdk.java.net Wed May 12 15:02:55 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 12 May 2021 15:02:55 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses In-Reply-To: References: Message-ID: <8Q7v434yMq1-JtD6TcZdSTjv-XskLWX5VIFrdsoaB7I=.677070dc-9f80-4587-a4ac-362ec82e180f@github.com> On Wed, 12 May 2021 13:30:09 GMT, Vladimir Ivanov wrote: > Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. > > Found 3 occurrences: > - `Dependencies::find_finalizable_subclass()` > - `reinitialize_vtable_of()` > - `VM_RedefineClasses::increment_class_counter()` > > Testing: > - [x] hs-tier1 - hs-tier4 Seems good. I don't see link to testing in RFE. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3995 From vlivanov at openjdk.java.net Wed May 12 15:11:25 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 12 May 2021 15:11:25 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses In-Reply-To: References: Message-ID: <2tu459STWm4pe5NkxcG6atYlSqqUtOihj9BGf0Z3DLU=.3313f2de-5926-456a-8b91-f3873fbf38ae@github.com> On Wed, 12 May 2021 13:30:09 GMT, Vladimir Ivanov wrote: > Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. > > Found 3 occurrences: > - `Dependencies::find_finalizable_subclass()` > - `reinitialize_vtable_of()` > - `VM_RedefineClasses::increment_class_counter()` > > Testing: > - [x] hs-tier1 - hs-tier4 Thanks for the review, Vladimir. > I don't see link to testing in RFE. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/3995 From kvn at openjdk.java.net Wed May 12 15:21:00 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 12 May 2021 15:21:00 GMT Subject: RFR: 8263006: Add optimization for Max(*)Node and Min(*)Node [v4] In-Reply-To: References: Message-ID: On Mon, 26 Apr 2021 09:39:00 GMT, Wang Huang wrote: >> * I optimize `max` and `min` by using these identities >> - op (max(a,b) , min(a,b))=== op(a,b) >> - if op is commutable >> - example : >> - max(a,b) + min(a,b))=== a + b // op = add >> - max(a,b) * min(a,b))=== a * b // op = mul >> - max( max(a,b) , min(a,b)))=== max(a,b) // op = max() >> - min( max(a,b) , min(a,b)))=== max(a,b) // op = min() >> * Test case >> ```java >> /* >> * Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved. >> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> * >> * This code is free software; you can redistribute it and/or modify it >> * under the terms of the GNU General Public License version 2 only, as >> * published by the Free Software Foundation. >> * >> * This code is distributed in the hope that it will be useful, but WITHOUT >> * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or >> * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >> * version 2 for more details (a copy is included in the LICENSE file that >> * accompanied this code). >> * >> * You should have received a copy of the GNU General Public License version >> * 2 along with this work; if not, write to the Free Software Foundation, >> * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. >> * >> * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA >> * or visit www.oracle.com if you need additional information or have any >> * questions. >> */ >> package org.sample; >> >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.*; >> >> import java.util.Random; >> import java.util.concurrent.TimeUnit; >> import org.openjdk.jmh.infra.Blackhole; >> >> @BenchmarkMode({Mode.AverageTime}) >> @OutputTimeUnit(TimeUnit.MICROSECONDS) >> public class MyBenchmark { >> >> static int length = 100000; >> static double[] data1 = new double[length]; >> static double[] data2 = new double[length]; >> static Random random = new Random(); >> >> static { >> for(int i = 0; i < length; ++i) { >> data1[i] = random.nextDouble(); >> data2[i] = random.nextDouble(); >> } >> } >> >> @Benchmark >> public void testAdd(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += Math.max(data1[i], data2[i]) + Math.min(data1[i], data2[i]); >> } >> bh.consume(sum); >> } >> >> @Benchmark >> public void testMax(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += Math.max(Math.max(data1[i], data2[i]), Math.min(data1[i], data2[i])); >> } >> bh.consume(sum); >> } >> >> @Benchmark >> public void testMin(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += Math.min(Math.max(data1[i], data2[i]), Math.min(data1[i], data2[i])); >> } >> bh.consume(sum); >> } >> >> @Benchmark >> public void testMul(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += (Math.max(data1[i], data2[i]) * Math.min(data1[i], data2[i])); >> } >> bh.consume(sum); >> } >> } >> ``` >> >> * The result is listed here (aarch64): >> >> before: >> >> |Benchmark| Mode| Samples| Score| Score error| Units| >> |---| ---| ---| ---| --- | ---| >> |o.s.MyBenchmark.testAdd |avgt | 10 | 556.048 | 32.368 | us/op | >> | o.s.MyBenchmark.testMax | avgt | 10 |543.065 | 54.221 | us/op | >> | o.s.MyBenchmark.testMin | avgt |10 |570.731 | 37.630 | us/op | >> | o.s.MyBenchmark.testMul | avgt | 10 | 531.906 | 20.518 | us/op | >> >> after: >> >> |Benchmark| Mode| Samples| Score| Score error| Units| >> |---| ---| ---| ---| --- | ---| >> | o.s.MyBenchmark.testAdd | avgt | 10 | 319.350 | 9.248 | us/op | >> | o.s.MyBenchmark.testMax | avgt | 10 | 356.138 | 10.736 | us/op | >> | o.s.MyBenchmark.testMin | avgt | 10 | 323.731 | 16.621 | us/op | >> | o.s.MyBenchmark.testMul | avgt | 10 | 338.458 | 23.755 | us/op | >> >> * I have tested `NaN` ` INFINITY` and `-INFINITY` and got same result (before/after) > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > add jmh test case Looks good. Thank you for adding tests. I will run some testing before approval. ------------- PR: https://git.openjdk.java.net/jdk/pull/3513 From sviswanathan at openjdk.java.net Wed May 12 16:57:17 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 12 May 2021 16:57:17 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v3] In-Reply-To: References: Message-ID: > LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: > "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" > > Consider the following code snippet: > ... > set_control(_gvn.transform(new IfTrueNode(rc))); > { > PreserveJVMState pjvms(this); > set_control(_gvn.transform(new IfFalseNode(rc))); > uncommon_trap(Deoptimization::Reason_range_check, > Deoptimization::Action_make_not_entrant); > } > .. > Here the control is being modified by set_control even though a bailout is possible afterwards. > Moving the set_control later in the intrinsic fixes this. > > This is a small fix. Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: return true if stopped ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3958/files - new: https://git.openjdk.java.net/jdk/pull/3958/files/1283b4a6..4f78a782 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3958&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3958&range=01-02 Stats: 18 lines in 1 file changed: 8 ins; 10 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3958.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3958/head:pull/3958 PR: https://git.openjdk.java.net/jdk/pull/3958 From sviswanathan at openjdk.java.net Wed May 12 16:57:18 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 12 May 2021 16:57:18 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v2] In-Reply-To: <7mjYr1XHcHP2aQq3I9rvx8EorlaiISkO2tf3ePZ9ACQ=.a41f5461-a6cd-49eb-b128-ad232c15da0a@github.com> References: <7mjYr1XHcHP2aQq3I9rvx8EorlaiISkO2tf3ePZ9ACQ=.a41f5461-a6cd-49eb-b128-ad232c15da0a@github.com> Message-ID: On Tue, 11 May 2021 01:06:25 GMT, Sandhya Viswanathan wrote: >> LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: >> "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" >> >> Consider the following code snippet: >> ... >> set_control(_gvn.transform(new IfTrueNode(rc))); >> { >> PreserveJVMState pjvms(this); >> set_control(_gvn.transform(new IfFalseNode(rc))); >> uncommon_trap(Deoptimization::Reason_range_check, >> Deoptimization::Action_make_not_entrant); >> } >> .. >> Here the control is being modified by set_control even though a bailout is possible afterwards. >> Moving the set_control later in the intrinsic fixes this. >> >> This is a small fix. Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments from Vladimir Kozlov @vnkozlov @iwanowww I have implemented your review comments. Please let me know if the changes look good now. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From jbhateja at openjdk.java.net Wed May 12 17:16:04 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 12 May 2021 17:16:04 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions Message-ID: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) -- | -- | -- | -- | -- | -- | -- ? | ? | ? | ? | ? | ? | ? ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 ------------- Commit messages: - 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions Changes: https://git.openjdk.java.net/jdk/pull/3999/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266951 Stats: 462 lines in 18 files changed: 198 ins; 124 del; 140 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From coleenp at openjdk.java.net Wed May 12 18:53:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 12 May 2021 18:53:02 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses In-Reply-To: References: Message-ID: <1n4hp0FnDEtYEhg3p-RIW5NlkXoJoFej5tGBgIAVCMc=.35c4d7ab-0d66-497c-90d9-54c08c35abd1@github.com> On Wed, 12 May 2021 13:30:09 GMT, Vladimir Ivanov wrote: > Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. > > Found 3 occurrences: > - `Dependencies::find_finalizable_subclass()` > - `reinitialize_vtable_of()` > - `VM_RedefineClasses::increment_class_counter()` > > Testing: > - [x] hs-tier1 - hs-tier4 This looks really good. I wonder if you can now make subklass() and next_sibling() functions private to Klass with it having ClassHierarchyIterator as a friend. If not, I still approve. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3995 From kvn at openjdk.java.net Wed May 12 19:08:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 12 May 2021 19:08:08 GMT Subject: RFR: 8266854: LibraryCallKit::inline_preconditions_checkIndex modifies control flow even if the intrinsic bailed out [v3] In-Reply-To: References: Message-ID: On Wed, 12 May 2021 16:57:17 GMT, Sandhya Viswanathan wrote: >> LibraryCallKit::inline_preconditions_checkIndex can result in the following assert sometimes: >> "# assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out" >> >> Consider the following code snippet: >> ... >> set_control(_gvn.transform(new IfTrueNode(rc))); >> { >> PreserveJVMState pjvms(this); >> set_control(_gvn.transform(new IfFalseNode(rc))); >> uncommon_trap(Deoptimization::Reason_range_check, >> Deoptimization::Action_make_not_entrant); >> } >> .. >> Here the control is being modified by set_control even though a bailout is possible afterwards. >> Moving the set_control later in the intrinsic fixes this. >> >> This is a small fix. Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > return true if stopped Looks good but it would be nice to have a test which verifies such edge cases for this intrinsic. ------------- PR: https://git.openjdk.java.net/jdk/pull/3958 From psandoz at openjdk.java.net Wed May 12 19:21:53 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 12 May 2021 19:21:53 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: <6BAOWhh_4s5-flV2niYNQ_k3xjV_h7CCLN9Jd69evOk=.a68c2d4c-db3f-41b7-82c2-0b78c0fd5671@github.com> On Wed, 12 May 2021 17:02:25 GMT, Jatin Bhateja wrote: > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) > -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? > ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 > ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 > ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 > ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 > ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 > ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 > ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 > ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 > ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 > ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 > ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 > ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 > ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 > ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 > ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 > ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 > ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 > ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 > ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 > ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 > ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 > ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 > ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 > ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 > ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 > ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 > ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 > ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 > ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 > ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 > ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 > ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 > ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 > ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 > ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 > ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 > ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 > ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 > ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 > ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 > ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 > ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 > ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 > ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 > ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 > ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 > ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 > ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 > ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 > ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 > ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 > ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 > ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 > ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 > ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 > ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 > ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 > ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 > ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 > ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 > ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 > ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 > ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 > ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 > ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 > ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 > ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 > ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 > ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 > ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 > ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 > ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 > ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 > ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 > ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 > ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 > ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 > ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 > ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 > ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 > ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 > ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 > ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 > ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 > ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 > ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 > ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 > ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 > ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 > ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 > ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 > ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 > ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 > ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 > ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 > ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 > ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 > ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 > ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 > ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 > ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 > ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 > ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 > ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 > ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 > ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 > ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 > ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 > ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 > ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 > ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 > ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 > ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 > ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 > ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 > ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 > ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 > ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 > ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 > ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 > ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 > ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 > ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 > ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 > ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 > ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 > ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 > ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 > ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 > ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 > ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 > ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 > ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 > ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 > ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 > ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 > ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 > ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 > ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 > ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 > ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 > ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 > ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 > ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 > ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 > ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 > ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 > ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 > ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 > ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 > ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 > ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 > ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 > ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 > ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 > ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 > ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 > ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 > ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 > ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 > ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 > ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 > ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 > ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 > ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 > ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 > ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 > ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 > ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 > ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 > ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 > ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 > ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 > ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 > ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 Is this optimization general for x86 platforms i.e. it is applicable AVX2/SSE in addition to AVX-512? I notice there are some performance regressions in the data you presented. Do you know why? The specification of `ArraysSupport.vectorizedMismatch` has changed to no longer return the bitwise compliment for a remaining tail to check. Did you encounter any performance issues that motivated the change? I would prefer to leave the specification of `ArraysSupport.vectorizedMismatch` unchanged, even though the x86 implementation always returns a non-negative value. That gives other platforms flexibility, thus choosing to, or not to, add more complex optimizations like you are proposing i.e. i think the approach you are taking is biasing too much to one implementation. Removal of the threshold check could result in performance regressions on various platforms, so potentially could the removal of the tail loop (and modifying the Unsafe implementation to check bytes). I think we need to performance test small sizes, just below and above the current threshold, with and without the intrinsic disabled. Note that the Java code as written attempts to a delicate balance for cross platform in combination with an intrinsic, when enabled. My general preference is to retain the existing specification and tail loops. To do that it may be necessary to add platform specific threshold values. Can we investigate whether you can achieve such performance when threshold values are set to zero on platforms that support partial inlining of vectorizedMismatch? ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From psandoz at openjdk.java.net Wed May 12 19:28:52 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 12 May 2021 19:28:52 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Wed, 12 May 2021 17:02:25 GMT, Jatin Bhateja wrote: > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) > -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? > ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 > ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 > ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 > ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 > ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 > ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 > ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 > ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 > ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 > ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 > ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 > ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 > ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 > ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 > ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 > ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 > ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 > ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 > ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 > ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 > ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 > ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 > ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 > ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 > ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 > ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 > ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 > ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 > ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 > ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 > ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 > ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 > ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 > ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 > ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 > ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 > ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 > ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 > ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 > ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 > ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 > ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 > ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 > ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 > ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 > ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 > ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 > ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 > ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 > ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 > ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 > ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 > ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 > ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 > ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 > ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 > ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 > ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 > ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 > ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 > ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 > ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 > ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 > ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 > ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 > ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 > ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 > ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 > ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 > ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 > ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 > ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 > ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 > ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 > ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 > ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 > ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 > ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 > ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 > ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 > ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 > ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 > ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 > ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 > ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 > ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 > ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 > ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 > ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 > ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 > ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 > ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 > ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 > ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 > ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 > ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 > ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 > ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 > ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 > ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 > ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 > ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 > ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 > ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 > ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 > ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 > ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 > ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 > ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 > ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 > ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 > ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 > ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 > ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 > ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 > ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 > ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 > ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 > ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 > ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 > ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 > ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 > ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 > ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 > ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 > ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 > ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 > ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 > ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 > ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 > ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 > ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 > ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 > ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 > ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 > ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 > ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 > ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 > ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 > ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 > ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 > ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 > ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 > ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 > ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 > ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 > ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 > ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 > ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 > ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 > ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 > ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 > ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 > ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 > ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 > ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 > ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 > ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 > ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 > ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 > ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 > ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 > ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 > ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 > ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 > ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 > ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 > ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 > ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 > ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 > ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 > ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 > ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 > ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 > ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 src/hotspot/share/opto/c2_globals.hpp line 85: > 83: range(0, max_jint) \ > 84: \ > 85: product(intx, UsePartialInlineSize, -1, DIAGNOSTIC, \ Unsure if the name change requires a CSR. Members of HotSpot can advise. Also, please check for any tests that might use this flag. src/hotspot/share/opto/c2_globals.hpp line 86: > 84: \ > 85: product(intx, UsePartialInlineSize, -1, DIAGNOSTIC, \ > 86: "Partial inline size used for array copy acceleration.") \ Description requires updating. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From dlong at openjdk.java.net Wed May 12 21:41:54 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 12 May 2021 21:41:54 GMT Subject: RFR: 8266074: Vtable-based CHA implementation [v4] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 18:45:07 GMT, Vladimir Ivanov wrote: >> As of now, Class Hierarchy Analysis (CHA) employs an approximate algorithm to enumerate all non-abstract methods in a class hierarchy. >> >> It served quite well for many years, but it accumulated significant complexity >> to support different corner cases over time and inevitable evolution of the JVM >> stretched the whole approach way too much (to the point where it become almost >> impossible to extend the analysis any further). >> >> It turns out the root problem is the decision to reimplement method resolution >> and method selection logic from scratch and to perform it on JVM internal >> representation. It makes it very hard to reason about correctness and the >> implementation becomes sensitive to changes in internal representation. >> >> So, the main motivation for the redesign is twofold: >> * reduce maintenance burden and increase confidence in the code; >> * unlock some long-awaited enhancements. >> >> Though I did experiment with relaxing existing constraints (e.g., enable default method support), >> any possible enhancements are deliberately kept out of scope for the current PR. >> (It does deliver a bit of minor enhancements front as the changes in >> compiler/cha/StrengthReduceInterfaceCall.java manifest, but it's a side effect >> of the other changes and was not the goal of the current work.) >> >> Proposed implementation (`LinkedConcreteMethodFinder`) mimics method invocation >> and relies on vtable/itable information to detect target method for every >> subclass it visits. It removes all the complexity associated with method >> resolution and method selection logic and leaves only essential logic to prepare for method selection. >> >> Vtables are filled during class linkage, so new logic doesn't work on not yet linked classed. >> Instead of supporting not yet linked case, it is simply ignored. It is safe to >> skip them (treat as "effectively non-concrete") since it is guaranteed there >> are no instances created yet. But it requires VM to check dependencies once a >> class is linked. >> >> I ended up with 2 separate dependency validation passes (when class is loaded >> and when it is linked). To avoid duplicated work, only dependencies >> which may be affected by class initialization state change >> (`unique_concrete_method_4`) are visited. >> >> (I experimented with merging passes into a single pass (delay the pass until >> linkage is over), but it severely affected other class-related dependencies and >> relevant optimizations.code.) >> >> Compiler Interface (CI) is changed to require users to provide complete information about the call site being analyzed. >> >> Old implementation is kept intact for now (will be removed later) to: >> - JVMCI hasn't been migrated to the new implementation yet; >> - enable verification that 2 implementations (old and new) agree on the results; >> - temporarily keep an option to revert to the original implementation in case any regressions show up. >> >> Testing: >> - [x] hs-tier1 - hs-tier9 >> - [x] hs-tier1 - hs-tier4 w/ `-XX:-UseVtableBasedCHA` >> - [x] performance testing >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Cover abstract method case Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3727 From kvn at openjdk.java.net Wed May 12 23:29:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 12 May 2021 23:29:07 GMT Subject: RFR: 8263006: Add optimization for Max(*)Node and Min(*)Node [v4] In-Reply-To: References: Message-ID: On Mon, 26 Apr 2021 09:39:00 GMT, Wang Huang wrote: >> * I optimize `max` and `min` by using these identities >> - op (max(a,b) , min(a,b))=== op(a,b) >> - if op is commutable >> - example : >> - max(a,b) + min(a,b))=== a + b // op = add >> - max(a,b) * min(a,b))=== a * b // op = mul >> - max( max(a,b) , min(a,b)))=== max(a,b) // op = max() >> - min( max(a,b) , min(a,b)))=== max(a,b) // op = min() >> * Test case >> ```java >> /* >> * Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved. >> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> * >> * This code is free software; you can redistribute it and/or modify it >> * under the terms of the GNU General Public License version 2 only, as >> * published by the Free Software Foundation. >> * >> * This code is distributed in the hope that it will be useful, but WITHOUT >> * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or >> * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >> * version 2 for more details (a copy is included in the LICENSE file that >> * accompanied this code). >> * >> * You should have received a copy of the GNU General Public License version >> * 2 along with this work; if not, write to the Free Software Foundation, >> * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. >> * >> * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA >> * or visit www.oracle.com if you need additional information or have any >> * questions. >> */ >> package org.sample; >> >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.*; >> >> import java.util.Random; >> import java.util.concurrent.TimeUnit; >> import org.openjdk.jmh.infra.Blackhole; >> >> @BenchmarkMode({Mode.AverageTime}) >> @OutputTimeUnit(TimeUnit.MICROSECONDS) >> public class MyBenchmark { >> >> static int length = 100000; >> static double[] data1 = new double[length]; >> static double[] data2 = new double[length]; >> static Random random = new Random(); >> >> static { >> for(int i = 0; i < length; ++i) { >> data1[i] = random.nextDouble(); >> data2[i] = random.nextDouble(); >> } >> } >> >> @Benchmark >> public void testAdd(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += Math.max(data1[i], data2[i]) + Math.min(data1[i], data2[i]); >> } >> bh.consume(sum); >> } >> >> @Benchmark >> public void testMax(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += Math.max(Math.max(data1[i], data2[i]), Math.min(data1[i], data2[i])); >> } >> bh.consume(sum); >> } >> >> @Benchmark >> public void testMin(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += Math.min(Math.max(data1[i], data2[i]), Math.min(data1[i], data2[i])); >> } >> bh.consume(sum); >> } >> >> @Benchmark >> public void testMul(Blackhole bh) { >> double sum = 0; >> for (int i = 0; i < length; i++) { >> sum += (Math.max(data1[i], data2[i]) * Math.min(data1[i], data2[i])); >> } >> bh.consume(sum); >> } >> } >> ``` >> >> * The result is listed here (aarch64): >> >> before: >> >> |Benchmark| Mode| Samples| Score| Score error| Units| >> |---| ---| ---| ---| --- | ---| >> |o.s.MyBenchmark.testAdd |avgt | 10 | 556.048 | 32.368 | us/op | >> | o.s.MyBenchmark.testMax | avgt | 10 |543.065 | 54.221 | us/op | >> | o.s.MyBenchmark.testMin | avgt |10 |570.731 | 37.630 | us/op | >> | o.s.MyBenchmark.testMul | avgt | 10 | 531.906 | 20.518 | us/op | >> >> after: >> >> |Benchmark| Mode| Samples| Score| Score error| Units| >> |---| ---| ---| ---| --- | ---| >> | o.s.MyBenchmark.testAdd | avgt | 10 | 319.350 | 9.248 | us/op | >> | o.s.MyBenchmark.testMax | avgt | 10 | 356.138 | 10.736 | us/op | >> | o.s.MyBenchmark.testMin | avgt | 10 | 323.731 | 16.621 | us/op | >> | o.s.MyBenchmark.testMul | avgt | 10 | 338.458 | 23.755 | us/op | >> >> * I have tested `NaN` ` INFINITY` and `-INFINITY` and got same result (before/after) > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > add jmh test case t1-4 testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3513 From whuang at openjdk.java.net Thu May 13 01:34:55 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Thu, 13 May 2021 01:34:55 GMT Subject: RFR: 8263006: Add optimization for Max(*)Node and Min(*)Node [v4] In-Reply-To: References: Message-ID: On Wed, 12 May 2021 23:25:59 GMT, Vladimir Kozlov wrote: > t1-4 testing passed. Thank you for your review and approval! ------------- PR: https://git.openjdk.java.net/jdk/pull/3513 From whuang at openjdk.java.net Thu May 13 01:50:01 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Thu, 13 May 2021 01:50:01 GMT Subject: RFR: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota [v2] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 03:30:06 GMT, Wang Huang wrote: >> Dear All, >> Here is the patch of JDK-8266720. Could you do me a favor to review this? >> * Reproduce: >> * cherry-pick JDK-8265956 >> * run patch's `TestVectorShuffleIotaByteWrongImpl.java` >> * However, this wrong of this code is obvious. >> * Reason : >> 1. In interpreter: >> >> static int partiallyWrapIndex(int index, int laneCount) { >> return checkIndex0(index, laneCount, (byte)-1); >> } >> >> @ForceInline >> static int checkIndex0(int index, int laneCount, byte mode) { >> int wrapped = VectorIntrinsics.wrapToRange(index, laneCount); >> if (mode == 0 || wrapped == index) { // NOTE here >> return wrapped; >> } >> if (mode < 0) { >> return wrapped - laneCount; // special mode for internal storage >> } >> throw checkIndexFailed(index, laneCount); >> } >> >> @ForceInline >> static int wrapToRange(int index, int size) { >> if ((size & (size - 1)) == 0) { >> // Size is zero or a power of two, so we got this. >> return index & (size - 1); >> } else { >> return wrapToRangeNPOT(index, size); >> } >> } >> >> 2. However, we have this intrinsics in >> src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk] >> ```c++ >> 386 } else { >> 387 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here >> 388 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); >> 389 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); >> // here BoolTest::ge != 1 (which means BoolTest::gt) >> 390 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); >> >> 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code: >> ```c++ >> // cond is useless here >> instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond) >> %{ >> predicate(n->as_Vector()->length() == 8 && >> n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge && >> n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE); >> match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); >> format %{ "cmge $dst, T8B, $src1, $src2\t# vector cmp (8B)" %} >> ins_cost(INSN_COST); >> ins_encode %{ >> __ cmge(as_FloatRegister($dst$$reg), __ T8B, >> as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg)); >> %} >> ins_pipe(vdop64); >> %} >> >> >> However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk >> ```c++ >> instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{ >> predicate(vector_length_in_bytes(n->in(1)->in(1)) >= 8 && // src1 >> vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1 >> is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1 >> match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); >> effect(TEMP scratch); >> format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %} >> ins_encode %{ >> int vlen_enc = vector_length_encoding(this, $src1); >> Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant); >> Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1)); >> __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register); >> %} >> ins_pipe( pipe_slow ); >> %} >> >> 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed: >> ```c++ >> 455 } else { >> 456 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here >> 457 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); >> 458 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); >> 459 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); >> >> Yours, >> Wang Huang > > Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota This issue will be closed because I will fix it on panama-vector since #3803 has not been merged. ------------- PR: https://git.openjdk.java.net/jdk/pull/3933 From whuang at openjdk.java.net Thu May 13 01:50:01 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Thu, 13 May 2021 01:50:01 GMT Subject: Withdrawn: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota In-Reply-To: References: Message-ID: On Sat, 8 May 2021 02:30:06 GMT, Wang Huang wrote: > Dear All, > Here is the patch of JDK-8266720. Could you do me a favor to review this? > * Reproduce: > * cherry-pick JDK-8265956 > * run patch's `TestVectorShuffleIotaByteWrongImpl.java` > * However, this wrong of this code is obvious. > * Reason : > 1. In interpreter: > > static int partiallyWrapIndex(int index, int laneCount) { > return checkIndex0(index, laneCount, (byte)-1); > } > > @ForceInline > static int checkIndex0(int index, int laneCount, byte mode) { > int wrapped = VectorIntrinsics.wrapToRange(index, laneCount); > if (mode == 0 || wrapped == index) { // NOTE here > return wrapped; > } > if (mode < 0) { > return wrapped - laneCount; // special mode for internal storage > } > throw checkIndexFailed(index, laneCount); > } > > @ForceInline > static int wrapToRange(int index, int size) { > if ((size & (size - 1)) == 0) { > // Size is zero or a power of two, so we got this. > return index & (size - 1); > } else { > return wrapToRangeNPOT(index, size); > } > } > > 2. However, we have this intrinsics in > src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk] > ```c++ > 386 } else { > 387 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here > 388 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); > 389 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); > // here BoolTest::ge != 1 (which means BoolTest::gt) > 390 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); > > 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code: > ```c++ > // cond is useless here > instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond) > %{ > predicate(n->as_Vector()->length() == 8 && > n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge && > n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE); > match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); > format %{ "cmge $dst, T8B, $src1, $src2\t# vector cmp (8B)" %} > ins_cost(INSN_COST); > ins_encode %{ > __ cmge(as_FloatRegister($dst$$reg), __ T8B, > as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg)); > %} > ins_pipe(vdop64); > %} > > > However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk > ```c++ > instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{ > predicate(vector_length_in_bytes(n->in(1)->in(1)) >= 8 && // src1 > vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1 > is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1 > match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); > effect(TEMP scratch); > format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %} > ins_encode %{ > int vlen_enc = vector_length_encoding(this, $src1); > Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant); > Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1)); > __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register); > %} > ins_pipe( pipe_slow ); > %} > > 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed: > ```c++ > 455 } else { > 456 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here > 457 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem)); > 458 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt)); > 459 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt)); > > Yours, > Wang Huang This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/3933 From jbhateja at openjdk.java.net Thu May 13 07:29:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 13 May 2021 07:29:53 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: <6BAOWhh_4s5-flV2niYNQ_k3xjV_h7CCLN9Jd69evOk=.a68c2d4c-db3f-41b7-82c2-0b78c0fd5671@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6BAOWhh_4s5-flV2niYNQ_k3xjV_h7CCLN9Jd69evOk=.a68c2d4c-db3f-41b7-82c2-0b78c0fd5671@github.com> Message-ID: On Wed, 12 May 2021 19:18:45 GMT, Paul Sandoz wrote: > My general preference is to retain the existing specification and tail loops. To do that it may be necessary to add platform specific threshold values. Can we investigate whether you can achieve such performance when threshold values are set to zero on platforms that support partial inlining of vectorizedMismatch? Hi @PaulSandoz, We do have AVX3Threshold value for platforms supporting AVX512 feature, its default value is currently set to 4096 bytes. Through partial in-lining we are attempting to generate the comparison code at the call site without calling stub. Performance data shows the gains for comparisons for sub-word types if the size is less than 32/64 bytes. The following algorithm briefly describes the existing implementation of ArraysSupport.mismatch routines. ArraySupport.mismatch() { if (lenght > THRESHOLD) { call ArraySupport.vectorizedMismatch() // This performs comparison using unsafe APIs at the granularity of 8 bytes. } else { for ( i = 0 ; i < THRESHOLD ; i++) scalar_comparison } } Java THRESHOLD values for various primitive types and extra headroom which partial inlining offers (UsePartialInlineSize = 32) (UsePartialInlineSize=64) (elem cnt/bytes) AVX3 - YMM register size = 32 bytes AVX3 - ZMM register size = 64 bytes Byte = 7 (7 bytes) 25 (25 bytes) 57 (57 bytes) Short = 3 (6 bytes) 13 (26 bytes) 29 (58 bytes) Int/Float = 1 (4 bytes) 7 (28 bytes) 15 (60 bytes) Long = 0 (0 bytes) 4 (32 bytes) 8 ( 64 bytes) Thus we can see that by JITing the comparison code at the call site we can take the advantage of saving the call overhead associated with stub calls which will dominate the cost of comparisons for small-sized compare operations. The only penalty which is also visible in the above performance data is for comparison sizes above UsePartialInling size we are doing an extra threshold comparison in the JITed code and probably a branch misprediction penalty since the fast path is the immediate block after the comparison. I can try to limit the patch to only exploit the extra headroom size as shown above since those cases should get the direct benefit out of partial inlining. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Thu May 13 09:36:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 13 May 2021 09:36:40 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses [v2] In-Reply-To: References: Message-ID: > Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. > > Found 3 occurrences: > - `Dependencies::find_finalizable_subclass()` > - `reinitialize_vtable_of()` > - `VM_RedefineClasses::increment_class_counter()` > > Testing: > - [x] hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: JFR ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3995/files - new: https://git.openjdk.java.net/jdk/pull/3995/files/cb586871..54a4bddd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3995&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3995&range=00-01 Stats: 46 lines in 2 files changed: 1 ins; 35 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/3995.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3995/head:pull/3995 PR: https://git.openjdk.java.net/jdk/pull/3995 From vlivanov at openjdk.java.net Thu May 13 09:39:58 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 13 May 2021 09:39:58 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses [v2] In-Reply-To: <1n4hp0FnDEtYEhg3p-RIW5NlkXoJoFej5tGBgIAVCMc=.35c4d7ab-0d66-497c-90d9-54c08c35abd1@github.com> References: <1n4hp0FnDEtYEhg3p-RIW5NlkXoJoFej5tGBgIAVCMc=.35c4d7ab-0d66-497c-90d9-54c08c35abd1@github.com> Message-ID: On Wed, 12 May 2021 18:49:41 GMT, Coleen Phillimore wrote: > I wonder if you can now make subklass() and next_sibling() functions private to Klass with it having ClassHierarchyIterator as a friend. There are some usages of `subklass()`/`next_sibling()` which don't benefit much from migration to `ClassHierarchyIterator` (e.g., `Dependencies::check_leaf_type()` and `ciInstanceKlass::dump_replay_data()`). I decided to leave them intact. Nevertheless, I spotted 2 more occurrences (in JFR code) that benefit from migration to `ClassHierarchyIterator`. (Passed `hs-tier5-rt` - `hs-tier7-rt` testing.) ------------- PR: https://git.openjdk.java.net/jdk/pull/3995 From vlivanov at openjdk.java.net Thu May 13 11:02:11 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 13 May 2021 11:02:11 GMT Subject: RFR: 8266074: Vtable-based CHA implementation [v4] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 18:45:07 GMT, Vladimir Ivanov wrote: >> As of now, Class Hierarchy Analysis (CHA) employs an approximate algorithm to enumerate all non-abstract methods in a class hierarchy. >> >> It served quite well for many years, but it accumulated significant complexity >> to support different corner cases over time and inevitable evolution of the JVM >> stretched the whole approach way too much (to the point where it become almost >> impossible to extend the analysis any further). >> >> It turns out the root problem is the decision to reimplement method resolution >> and method selection logic from scratch and to perform it on JVM internal >> representation. It makes it very hard to reason about correctness and the >> implementation becomes sensitive to changes in internal representation. >> >> So, the main motivation for the redesign is twofold: >> * reduce maintenance burden and increase confidence in the code; >> * unlock some long-awaited enhancements. >> >> Though I did experiment with relaxing existing constraints (e.g., enable default method support), >> any possible enhancements are deliberately kept out of scope for the current PR. >> (It does deliver a bit of minor enhancements front as the changes in >> compiler/cha/StrengthReduceInterfaceCall.java manifest, but it's a side effect >> of the other changes and was not the goal of the current work.) >> >> Proposed implementation (`LinkedConcreteMethodFinder`) mimics method invocation >> and relies on vtable/itable information to detect target method for every >> subclass it visits. It removes all the complexity associated with method >> resolution and method selection logic and leaves only essential logic to prepare for method selection. >> >> Vtables are filled during class linkage, so new logic doesn't work on not yet linked classed. >> Instead of supporting not yet linked case, it is simply ignored. It is safe to >> skip them (treat as "effectively non-concrete") since it is guaranteed there >> are no instances created yet. But it requires VM to check dependencies once a >> class is linked. >> >> I ended up with 2 separate dependency validation passes (when class is loaded >> and when it is linked). To avoid duplicated work, only dependencies >> which may be affected by class initialization state change >> (`unique_concrete_method_4`) are visited. >> >> (I experimented with merging passes into a single pass (delay the pass until >> linkage is over), but it severely affected other class-related dependencies and >> relevant optimizations.code.) >> >> Compiler Interface (CI) is changed to require users to provide complete information about the call site being analyzed. >> >> Old implementation is kept intact for now (will be removed later) to: >> - JVMCI hasn't been migrated to the new implementation yet; >> - enable verification that 2 implementations (old and new) agree on the results; >> - temporarily keep an option to revert to the original implementation in case any regressions show up. >> >> Testing: >> - [x] hs-tier1 - hs-tier9 >> - [x] hs-tier1 - hs-tier4 w/ `-XX:-UseVtableBasedCHA` >> - [x] performance testing >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Cover abstract method case Thanks for the reviews, John, Vladimir, and Dean. ------------- PR: https://git.openjdk.java.net/jdk/pull/3727 From vlivanov at openjdk.java.net Thu May 13 11:02:12 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 13 May 2021 11:02:12 GMT Subject: Integrated: 8266074: Vtable-based CHA implementation In-Reply-To: References: Message-ID: <-7UZAV286GR2XjsxtWGQ0dT8N5mN1VjNvksABZQk4rU=.6c7d869b-4f1c-4c4a-96eb-f697b14bf5c3@github.com> On Tue, 27 Apr 2021 20:15:25 GMT, Vladimir Ivanov wrote: > As of now, Class Hierarchy Analysis (CHA) employs an approximate algorithm to enumerate all non-abstract methods in a class hierarchy. > > It served quite well for many years, but it accumulated significant complexity > to support different corner cases over time and inevitable evolution of the JVM > stretched the whole approach way too much (to the point where it become almost > impossible to extend the analysis any further). > > It turns out the root problem is the decision to reimplement method resolution > and method selection logic from scratch and to perform it on JVM internal > representation. It makes it very hard to reason about correctness and the > implementation becomes sensitive to changes in internal representation. > > So, the main motivation for the redesign is twofold: > * reduce maintenance burden and increase confidence in the code; > * unlock some long-awaited enhancements. > > Though I did experiment with relaxing existing constraints (e.g., enable default method support), > any possible enhancements are deliberately kept out of scope for the current PR. > (It does deliver a bit of minor enhancements front as the changes in > compiler/cha/StrengthReduceInterfaceCall.java manifest, but it's a side effect > of the other changes and was not the goal of the current work.) > > Proposed implementation (`LinkedConcreteMethodFinder`) mimics method invocation > and relies on vtable/itable information to detect target method for every > subclass it visits. It removes all the complexity associated with method > resolution and method selection logic and leaves only essential logic to prepare for method selection. > > Vtables are filled during class linkage, so new logic doesn't work on not yet linked classed. > Instead of supporting not yet linked case, it is simply ignored. It is safe to > skip them (treat as "effectively non-concrete") since it is guaranteed there > are no instances created yet. But it requires VM to check dependencies once a > class is linked. > > I ended up with 2 separate dependency validation passes (when class is loaded > and when it is linked). To avoid duplicated work, only dependencies > which may be affected by class initialization state change > (`unique_concrete_method_4`) are visited. > > (I experimented with merging passes into a single pass (delay the pass until > linkage is over), but it severely affected other class-related dependencies and > relevant optimizations.code.) > > Compiler Interface (CI) is changed to require users to provide complete information about the call site being analyzed. > > Old implementation is kept intact for now (will be removed later) to: > - JVMCI hasn't been migrated to the new implementation yet; > - enable verification that 2 implementations (old and new) agree on the results; > - temporarily keep an option to revert to the original implementation in case any regressions show up. > > Testing: > - [x] hs-tier1 - hs-tier9 > - [x] hs-tier1 - hs-tier4 w/ `-XX:-UseVtableBasedCHA` > - [x] performance testing > > Thanks! This pull request has now been integrated. Changeset: 127bfe44 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/127bfe44f7d09f272a08f97bfc5d168eb22474a2 Stats: 557 lines in 11 files changed: 482 ins; 6 del; 69 mod 8266074: Vtable-based CHA implementation Reviewed-by: kvn, jrose, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/3727 From dholmes at openjdk.java.net Thu May 13 12:46:29 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 13 May 2021 12:46:29 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution Message-ID: As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. Testing: tiers 1-3 Thanks, David ------------- Commit messages: - Removed divDPR_reg_round as it has a false predicate and so is now unused - Revert classFileParser changes as they will be handled by JDK-8266530 - 8266530: HotSpot changes for JEP 306 Changes: https://git.openjdk.java.net/jdk/pull/3991/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266950 Stats: 203 lines in 27 files changed: 4 ins; 153 del; 46 mod Patch: https://git.openjdk.java.net/jdk/pull/3991.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3991/head:pull/3991 PR: https://git.openjdk.java.net/jdk/pull/3991 From psandoz at openjdk.java.net Thu May 13 16:20:55 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 13 May 2021 16:20:55 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Wed, 12 May 2021 17:02:25 GMT, Jatin Bhateja wrote: > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) > -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? > ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 > ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 > ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 > ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 > ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 > ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 > ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 > ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 > ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 > ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 > ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 > ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 > ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 > ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 > ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 > ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 > ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 > ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 > ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 > ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 > ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 > ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 > ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 > ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 > ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 > ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 > ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 > ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 > ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 > ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 > ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 > ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 > ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 > ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 > ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 > ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 > ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 > ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 > ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 > ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 > ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 > ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 > ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 > ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 > ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 > ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 > ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 > ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 > ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 > ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 > ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 > ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 > ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 > ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 > ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 > ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 > ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 > ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 > ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 > ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 > ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 > ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 > ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 > ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 > ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 > ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 > ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 > ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 > ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 > ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 > ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 > ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 > ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 > ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 > ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 > ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 > ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 > ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 > ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 > ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 > ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 > ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 > ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 > ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 > ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 > ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 > ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 > ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 > ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 > ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 > ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 > ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 > ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 > ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 > ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 > ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 > ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 > ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 > ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 > ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 > ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 > ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 > ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 > ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 > ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 > ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 > ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 > ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 > ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 > ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 > ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 > ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 > ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 > ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 > ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 > ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 > ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 > ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 > ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 > ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 > ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 > ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 > ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 > ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 > ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 > ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 > ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 > ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 > ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 > ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 > ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 > ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 > ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 > ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 > ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 > ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 > ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 > ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 > ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 > ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 > ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 > ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 > ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 > ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 > ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 > ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 > ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 > ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 > ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 > ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 > ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 > ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 > ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 > ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 > ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 > ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 > ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 > ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 > ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 > ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 > ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 > ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 > ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 > ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 > ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 > ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 > ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 > ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 > ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 > ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 > ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 > ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 > ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 > ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 > ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 Thanks for the explanations on why partial inlining can be beneficial. Ideally it would be great if the only changes we made to the Java code were to the threshold values. For example: public static int mismatch(byte[] a, byte[] b, int length) { // ISSUE: defer to index receiving methods if performance is good // assert length <= a.length // assert length <= b.length int i = 0; if (length > BYTE_THRESHOLD) { if (a[0] != b[0]) return 0; i = vectorizedMismatch( a, Unsafe.ARRAY_BYTE_BASE_OFFSET, b, Unsafe.ARRAY_BYTE_BASE_OFFSET, length, LOG2_ARRAY_BYTE_INDEX_SCALE); if (i >= 0) return i; // Align to tail i = length - ~i; // assert i >= 0 && i <= 7; } // Tail < 8 bytes for (; i < length; i++) { if (a[i] != b[i]) return i; } return -1; } Where `BYTE_THRESHOLD` is initialized to 7 or 0, based on querying some HotSpot runtime property. When `BYTE_THRESHOLD == 0` i hope the `length > BYTE_THRESHOLD` check is strength reduced in many cases. That does leave the `i >= 0` check of the result from `vectorizedMismatch`, perhaps that also has some minor impact? However, maybe since you are doing partial inlining and you know that your `vectorizedMismatch` intrinsic never returns a -ve value you could elide that check? A quick experiment would be to apply your HotSpot changes and use the existing Java code, replacing the constant threshold values with 0. The we can carefully look at the code gen and perf results. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Thu May 13 22:10:45 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 13 May 2021 22:10:45 GMT Subject: RFR: 8267117: sun/hotspot/whitebox/CPUInfoTest.java fails on Ice Lake Message-ID: Fix a typo in the test. Testing: - [x] failing test on Ice Lake host ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/4017/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4017&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267117 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4017.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4017/head:pull/4017 PR: https://git.openjdk.java.net/jdk/pull/4017 From kvn at openjdk.java.net Thu May 13 22:54:59 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 13 May 2021 22:54:59 GMT Subject: RFR: 8267117: sun/hotspot/whitebox/CPUInfoTest.java fails on Ice Lake In-Reply-To: References: Message-ID: On Thu, 13 May 2021 20:55:04 GMT, Vladimir Ivanov wrote: > Fix a typo in the test. > > Testing: > - [x] failing test on Ice Lake host Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4017 From vlivanov at openjdk.java.net Thu May 13 23:29:38 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 13 May 2021 23:29:38 GMT Subject: RFR: 8267117: sun/hotspot/whitebox/CPUInfoTest.java fails on Ice Lake In-Reply-To: References: Message-ID: <4cFKVtqTRsVqZ3eJnGQN5XCPZFcMdYz02jm6OiiICn4=.f47b147e-4973-48a8-9d36-7818151169a9@github.com> On Thu, 13 May 2021 20:55:04 GMT, Vladimir Ivanov wrote: > Fix a typo in the test. > > Testing: > - [x] failing test on Ice Lake host Thanks for the review, Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/4017 From vlivanov at openjdk.java.net Thu May 13 23:29:38 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 13 May 2021 23:29:38 GMT Subject: Integrated: 8267117: sun/hotspot/whitebox/CPUInfoTest.java fails on Ice Lake In-Reply-To: References: Message-ID: On Thu, 13 May 2021 20:55:04 GMT, Vladimir Ivanov wrote: > Fix a typo in the test. > > Testing: > - [x] failing test on Ice Lake host This pull request has now been integrated. Changeset: 2a2f105a Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/2a2f105a56bba3a180658f0b0151240676478ba4 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8267117: sun/hotspot/whitebox/CPUInfoTest.java fails on Ice Lake Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/4017 From whuang at openjdk.java.net Fri May 14 02:24:01 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 14 May 2021 02:24:01 GMT Subject: RFR: 8267130: Memory Overflow in Disassembler::load_library Message-ID: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> * reproduce: put your libjvm.so in a long enough path, such like ------------- Commit messages: - 8267130: Memory Overflow in Disassembler::load_library Changes: https://git.openjdk.java.net/jdk/pull/4020/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4020&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267130 Stats: 29 lines in 1 file changed: 17 ins; 5 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4020.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4020/head:pull/4020 PR: https://git.openjdk.java.net/jdk/pull/4020 From xgong at openjdk.java.net Fri May 14 06:13:02 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Fri, 14 May 2021 06:13:02 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node Message-ID: When creating the vector shuffle, the `"VectorLoadConstNode"` will be created to get an initial index vector. Before creating it, the compiler should check whether the current platform supports this opcode in case the jvm crashes with `"bad ad file"`. The compiler should finish the intrinsification and go back to the default java implementation if the backend doesn't support it. Tested tier1 and jdk::tier3. ------------- Commit messages: - 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node Changes: https://git.openjdk.java.net/jdk/pull/4023/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4023&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266962 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4023.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4023/head:pull/4023 PR: https://git.openjdk.java.net/jdk/pull/4023 From yyang at openjdk.java.net Fri May 14 06:45:39 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 14 May 2021 06:45:39 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v3] In-Reply-To: References: Message-ID: On Wed, 28 Apr 2021 06:43:19 GMT, Yi Yang wrote: >> It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > rename; redundant reloading PING?Can I have a second review of this PR? ------------- PR: https://git.openjdk.java.net/jdk/pull/3616 From redestad at openjdk.java.net Fri May 14 07:07:53 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 14 May 2021 07:07:53 GMT Subject: RFR: 8266810: Move trivial Matcher code to cpu-specific header files [v2] In-Reply-To: References: Message-ID: > This patch moves a number of constants and trivial methods to newly introduced matcher_.hpp files. > > This enables constant folding and dead code elimination on one hand, and improved code navigation in IDEs on the other. > > The effect of this refactoring is modest: on Linux-x64 Hotspot (libjvm.so) shrinks by ~10Kb and C2 initialization cost drops from 8.5M to 8.3M. > > Testing: tier1-3, GHA builds of all architectures on linux Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - merge and resolve conflict in arm.ad - Move more predicates from .ad files - Move a few more predicates to matcher_cpu files - Add the new matcher per-cpu files - Add matcher_cpu files and move const bools there - Define constants to allow Matcher::supports_scalable_vector() calls to be DCEd on non-aarch64 platforms ------------- Changes: https://git.openjdk.java.net/jdk/pull/3947/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3947&range=01 Stats: 1459 lines in 15 files changed: 775 ins; 682 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3947.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3947/head:pull/3947 PR: https://git.openjdk.java.net/jdk/pull/3947 From stuefe at openjdk.java.net Fri May 14 07:49:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 14 May 2021 07:49:37 GMT Subject: RFR: 8267130: Memory Overflow in Disassembler::load_library In-Reply-To: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> References: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> Message-ID: On Fri, 14 May 2021 02:17:29 GMT, Wang Huang wrote: > * reproduce: > put your libjvm.so in a long enough path, such like Just a side note, I wonder whether using stringStream would not be better suited for this. ------------- PR: https://git.openjdk.java.net/jdk/pull/4020 From jbhateja at openjdk.java.net Fri May 14 08:11:25 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 08:11:25 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v2] In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8256973: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3916/files - new: https://git.openjdk.java.net/jdk/pull/3916/files/40448854..15e3ffd3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=00-01 Stats: 24 lines in 1 file changed: 0 ins; 0 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/3916.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3916/head:pull/3916 PR: https://git.openjdk.java.net/jdk/pull/3916 From jbhateja at openjdk.java.net Fri May 14 08:11:26 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 08:11:26 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v2] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: <-4nyqsPsNWtgNJOXfIxu6I_Z66n3Dbbk5MtdOj01FeM=.39c75830-23ae-4a93-a303-5038eabeb28d@github.com> On Fri, 7 May 2021 19:04:01 GMT, Paul Sandoz wrote: >>> These mask operations can be considered a form of reduction. >>> >>> Do you think it makes sense to reuse `VectorSupport.reductionCoerced` instead of adding a new intrinsic? (Note that we reuse `VectorSupport.binaryOp` for mask logical binary operations). >>> >>> Perhaps that allows for further reuse later if/when we add operations to integral vectors to count bits like we already have with scalars, such as `Integer.bitCount`, `Integer.numberOfLeadingZeros` etc? >> >> Hi @PaulSandoz , that's a nice suggestion, I think instead of reduction which may emit bulky sequence, VectorMask.toLong() + Long.bitCount() could have been used for trueCount. But since toLong may not work for ARM SVE, so in the mean time intrinsifying at the level of API looked reasonable. > >> Hi @PaulSandoz , that's a nice suggestion, I think instead of reduction which may emit bulky sequence, VectorMask.toLong() + Long.bitCount() could have been used for trueCount. But since toLong may not work for ARM SVE, so in the mean time intrinsifying at the level of API looked reasonable. > > Do you mean that reusing `VectorSupport.reductionCoerced` as the intrinsic entry point may emit bulky sequence? > > Note that i was not suggesting to reuse `Long.bitCount()` etc. i was just using that as a example that the bit-wise reduction operations on masks can also apply to integral vectors, suggesting there might be some sharing in C2 just like is done for binary-wise operations, such as logical AND. > > For example: > > @Override > @ForceInline > public Int256Mask and(VectorMask mask) { > Objects.requireNonNull(mask); > Int256Mask m = (Int256Mask)mask; > return VectorSupport.binaryOp(VECTOR_OP_AND, Int256Mask.class, int.class, VLENGTH, > this, m, > (m1, m2) -> m1.bOp(m2, (i, a, b) -> a & b)); > } > > > And notice that `VECTOR_OP_AND` is reused for vector lane-wise binary and reduction operations on `IntVector` etc. Can we do the same for other bitwise reduction-like operations, first implementing optimal support for masks, then later expanding for integral vectors? > > So rather than introducing specific constants, such as `VECTOR_OP_MASK_TRUECOUNT` etc, we can generalize to `VECTOR_OP_BITCOUNT` etc that can apply to both masks and integral vectors, where for masks we interpret `BIT` appropriately to mean `boolean` true value. Hi @PaulSandoz , thanks your comments on JMH have been addressed. @neliasso @iwanowww kindly share your feedback/comments on compiler side changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From stuefe at openjdk.java.net Fri May 14 08:11:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 14 May 2021 08:11:37 GMT Subject: RFR: 8267130: Memory Overflow in Disassembler::load_library In-Reply-To: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> References: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> Message-ID: On Fri, 14 May 2021 02:17:29 GMT, Wang Huang wrote: > * reproduce: > put your libjvm.so in a long enough path, such like Hi @Wanghuang-Huawei , Long term this coding may benefit by using stringStream, which takes care of truncating (or dynamically allocating) memory as well as string appending. But for this fix, I think this is mostly fine. See inline remarks. Cheers, Thomas src/hotspot/share/compiler/disassembler.cpp line 807: > 805: if (jvm_offset >= 0) { > 806: // 1. /lib//libhsdis-.so > 807: if (jvm_offset + strlen(hsdis_library_name) + strlen(os::dll_file_extension()) < JVM_MAXPATHLEN) { Don't we need space for the terminating zero here? ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4020 From whuang at openjdk.java.net Fri May 14 08:39:42 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 14 May 2021 08:39:42 GMT Subject: RFR: 8267130: Memory Overflow in Disassembler::load_library In-Reply-To: References: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> Message-ID: On Fri, 14 May 2021 08:03:43 GMT, Thomas Stuefe wrote: >> * reproduce: >> put your libjvm.so in a long enough path, such like > > src/hotspot/share/compiler/disassembler.cpp line 807: > >> 805: if (jvm_offset >= 0) { >> 806: // 1. /lib//libhsdis-.so >> 807: if (jvm_offset + strlen(hsdis_library_name) + strlen(os::dll_file_extension()) < JVM_MAXPATHLEN) { > > Don't we need space for the terminating zero here? We use `<` here. If we use `<=`, we should consider terminating zero ;-) ------------- PR: https://git.openjdk.java.net/jdk/pull/4020 From redestad at openjdk.java.net Fri May 14 10:40:39 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 14 May 2021 10:40:39 GMT Subject: Integrated: 8266810: Move trivial Matcher code to cpu-specific header files In-Reply-To: References: Message-ID: On Mon, 10 May 2021 11:03:06 GMT, Claes Redestad wrote: > This patch moves a number of constants and trivial methods to newly introduced matcher_.hpp files. > > This enables constant folding and dead code elimination on one hand, and improved code navigation in IDEs on the other. > > The effect of this refactoring is modest: on Linux-x64 Hotspot (libjvm.so) shrinks by ~10Kb and C2 initialization cost drops from 8.5M to 8.3M. > > Testing: tier1-3, GHA builds of all architectures on linux This pull request has now been integrated. Changeset: 644f28c0 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/644f28c0ead18a37d7996ec30b49718a2f6aa189 Stats: 1459 lines in 15 files changed: 775 ins; 682 del; 2 mod 8266810: Move trivial Matcher code to cpu-specific header files Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3947 From jbhateja at openjdk.java.net Fri May 14 11:37:08 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 11:37:08 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v2] In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) > -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? > ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 > ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 > ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 > ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 > ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 > ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 > ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 > ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 > ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 > ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 > ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 > ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 > ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 > ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 > ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 > ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 > ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 > ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 > ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 > ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 > ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 > ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 > ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 > ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 > ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 > ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 > ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 > ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 > ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 > ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 > ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 > ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 > ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 > ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 > ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 > ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 > ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 > ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 > ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 > ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 > ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 > ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 > ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 > ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 > ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 > ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 > ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 > ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 > ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 > ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 > ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 > ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 > ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 > ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 > ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 > ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 > ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 > ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 > ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 > ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 > ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 > ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 > ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 > ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 > ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 > ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 > ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 > ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 > ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 > ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 > ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 > ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 > ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 > ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 > ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 > ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 > ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 > ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 > ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 > ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 > ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 > ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 > ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 > ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 > ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 > ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 > ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 > ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 > ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 > ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 > ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 > ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 > ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 > ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 > ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 > ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 > ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 > ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 > ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 > ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 > ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 > ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 > ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 > ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 > ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 > ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 > ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 > ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 > ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 > ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 > ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 > ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 > ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 > ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 > ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 > ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 > ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 > ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 > ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 > ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 > ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 > ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 > ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 > ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 > ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 > ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 > ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 > ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 > ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 > ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 > ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 > ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 > ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 > ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 > ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 > ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 > ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 > ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 > ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 > ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 > ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 > ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 > ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 > ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 > ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 > ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 > ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 > ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 > ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 > ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 > ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 > ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 > ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 > ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 > ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 > ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 > ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 > ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 > ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 > ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 > ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 > ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 > ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 > ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 > ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 > ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 > ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 > ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 > ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 > ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 > ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 > ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 > ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 > ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 > ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266951: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3999/files - new: https://git.openjdk.java.net/jdk/pull/3999/files/41079e8e..851662e4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=00-01 Stats: 272 lines in 4 files changed: 122 ins; 31 del; 119 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From github.com+4146708+a74nh at openjdk.java.net Fri May 14 11:37:03 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Fri, 14 May 2021 11:37:03 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly Message-ID: For many of the stub frames, a leave/ret is generated after the stub has already branched or returned. This is confusing. For these cases, replace the superfluous code with a should_not_reach_here For handle excception, instead of storing return from the exception handler on the stack, it can be moved directly into lr, replacing a store and load with a single move. (If/when PAC support is implemented, then this store would also have to be signed). ------------- Commit messages: - AArch64: Improve C1 Stub Frame code Changes: https://git.openjdk.java.net/jdk/pull/4030/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4030&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267098 Stats: 52 lines in 1 file changed: 14 ins; 16 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/4030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4030/head:pull/4030 PR: https://git.openjdk.java.net/jdk/pull/4030 From jbhateja at openjdk.java.net Fri May 14 11:37:08 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 11:37:08 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Thu, 13 May 2021 16:18:06 GMT, Paul Sandoz wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) >> -- | -- | -- | -- | -- | -- | -- >> ? | ? | ? | ? | ? | ? | ? >> ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 >> ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 >> ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 >> ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 >> ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 >> ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 >> ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 >> ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 >> ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 >> ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 >> ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 >> ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 >> ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 >> ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 >> ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 >> ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 >> ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 >> ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 >> ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 >> ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 >> ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 >> ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 >> ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 >> ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 >> ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 >> ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 >> ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 >> ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 >> ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 >> ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 >> ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 >> ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 >> ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 >> ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 >> ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 >> ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 >> ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 >> ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 >> ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 >> ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 >> ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 >> ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 >> ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 >> ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 >> ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 >> ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 >> ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 >> ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 >> ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 >> ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 >> ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 >> ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 >> ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 >> ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 >> ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 >> ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 >> ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 >> ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 >> ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 >> ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 >> ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 >> ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 >> ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 >> ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 >> ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 >> ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 >> ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 >> ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 >> ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 >> ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 >> ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 >> ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 >> ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 >> ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 >> ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 >> ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 >> ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 >> ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 >> ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 >> ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 >> ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 >> ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 >> ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 >> ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 >> ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 >> ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 >> ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 >> ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 >> ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 >> ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 >> ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 >> ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 >> ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 >> ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 >> ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 >> ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 >> ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 >> ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 >> ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 >> ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 >> ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 >> ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 >> ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 >> ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 >> ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 >> ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 >> ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 >> ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 >> ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 >> ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 >> ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 >> ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 >> ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 >> ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 >> ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 >> ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 >> ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 >> ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 >> ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 >> ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 >> ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 >> ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 >> ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 >> ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 >> ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 >> ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 >> ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 >> ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 >> ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 >> ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 >> ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 >> ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 >> ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 >> ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 >> ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 >> ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 >> ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 >> ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 >> ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 >> ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 >> ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 >> ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 >> ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 >> ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 >> ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 >> ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 >> ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 >> ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 >> ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 >> ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 >> ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 >> ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 >> ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 >> ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 >> ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 >> ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 >> ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 >> ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 >> ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 >> ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 >> ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 >> ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 >> ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 >> ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 >> ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 >> ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 >> ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 >> ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 >> ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 >> ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 >> ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 >> ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 >> ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 >> ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 >> ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 > > Thanks for the explanations on why partial inlining can be beneficial. Ideally it would be great if the only changes we made to the Java code were to the threshold values. > > For example: > > public static int mismatch(byte[] a, > byte[] b, > int length) { > // ISSUE: defer to index receiving methods if performance is good > // assert length <= a.length > // assert length <= b.length > > int i = 0; > if (length > BYTE_THRESHOLD) { > if (a[0] != b[0]) > return 0; > i = vectorizedMismatch( > a, Unsafe.ARRAY_BYTE_BASE_OFFSET, > b, Unsafe.ARRAY_BYTE_BASE_OFFSET, > length, LOG2_ARRAY_BYTE_INDEX_SCALE); > if (i >= 0) > return i; > // Align to tail > i = length - ~i; > // assert i >= 0 && i <= 7; > } > // Tail < 8 bytes > for (; i < length; i++) { > if (a[i] != b[i]) > return i; > } > return -1; > } > > > Where `BYTE_THRESHOLD` is initialized to 7 or 0, based on querying some HotSpot runtime property. When `BYTE_THRESHOLD == 0` i hope the `length > BYTE_THRESHOLD` check is strength reduced in many cases. > > That does leave the `i >= 0` check of the result from `vectorizedMismatch`, perhaps that also has some minor impact? However, maybe since you are doing partial inlining and you know that your `vectorizedMismatch` intrinsic never returns a -ve value you could elide that check? > > A quick experiment would be to apply your HotSpot changes and use the existing Java code, replacing the constant threshold values with 0. The we can carefully look at the code gen and perf results. Hi @PaulSandoz , I have reinstated the tail handling in java to avoid any impact on other targets. Update performance numbers still show gains for small comparison sized upto -XX:UsePartialInlineSize. Thus patch now does not changes existing java implementation of VectorizedMismatch. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Fri May 14 11:37:09 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 11:37:09 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v2] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Wed, 12 May 2021 18:36:34 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266951: Review comments resolution. > > src/hotspot/share/opto/c2_globals.hpp line 85: > >> 83: range(0, max_jint) \ >> 84: \ >> 85: product(intx, UsePartialInlineSize, -1, DIAGNOSTIC, \ > > Unsure if the name change requires a CSR. Members of HotSpot can advise. > > Also, please check for any tests that might use this flag. -XX:UsePartialInlineSize is a diagnostic option and not a product option. Thus CSR may not be relevant for this case. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From aph at openjdk.java.net Fri May 14 13:07:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 14 May 2021 13:07:07 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: On Fri, 14 May 2021 11:28:45 GMT, Alan Hayward wrote: > For many of the stub frames, a leave/ret is generated after the stub has > already branched or returned. This is confusing. For these cases, replace > the superfluous code with a should_not_reach_here > > For handle excception, instead of storing return from the exception > handler on the stack, it can be moved directly into lr, replacing a store and > load with a single move. (If/when PAC support is implemented, then this store > would also have to be signed). src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 442: > 440: // handler regardless of whether handler existed in the nmethod. > 441: // Move it out of the way to the return register. > 442: __ mov(lr, r0); I don't think that leaving LR live here is a good idea. Storing into the stack frame is fine. src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 607: > 605: const bool must_gc_arguments = true; > 606: const bool dont_gc_arguments = false; > 607: const bool does_not_return = true; I wonder if an `enum` would read better here. Something like enum may_return_t { does_not_return, may_return }; class StubFrame { StubFrame(StubAssembler* sasm, const char* name, bool must_gc_arguments, may_return_t may_return); }; ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From github.com+4146708+a74nh at openjdk.java.net Fri May 14 13:18:44 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Fri, 14 May 2021 13:18:44 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: On Fri, 14 May 2021 12:56:16 GMT, Andrew Haley wrote: >> For many of the stub frames, a leave/ret is generated after the stub has >> already branched or returned. This is confusing. For these cases, replace >> the superfluous code with a should_not_reach_here >> >> For handle excception, instead of storing return from the exception >> handler on the stack, it can be moved directly into lr, replacing a store and >> load with a single move. (If/when PAC support is implemented, then this store >> would also have to be signed). > > src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 442: > >> 440: // handler regardless of whether handler existed in the nmethod. >> 441: // Move it out of the way to the return register. >> 442: __ mov(lr, r0); > > I don't think that leaving LR live here is a good idea. Storing into the stack frame is fine. My motivation here was then if/when PAC is enabled, that store will have to sign the value before storing, then auth the value on loading it again. That won't be the fastest, and seemed a waste. Agreed that it makes the code slightly more awkward. > src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 607: > >> 605: const bool must_gc_arguments = true; >> 606: const bool dont_gc_arguments = false; >> 607: const bool does_not_return = true; > > I wonder if an `enum` would read better here. > > Something like > > > enum may_return_t { > does_not_return, may_return > }; > > class StubFrame { > > StubFrame(StubAssembler* sasm, const char* name, bool must_gc_arguments, may_return_t may_return); > > }; Ok. I was keeping to the existing style of dont_gc_arguments/must_gc_arguments - but I could change those too so they match. ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From vlivanov at openjdk.java.net Fri May 14 13:19:42 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 14 May 2021 13:19:42 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v2] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 14 May 2021 08:11:25 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Review comments resolution. src/hotspot/cpu/x86/assembler_x86.cpp line 9221: > 9219: > 9220: void Assembler::evpmovb2m(KRegister dst, XMMRegister src, int vector_len) { > 9221: assert(VM_Version::supports_avx512bw(), ""); Should it be `VM_Version::supports_avx512vlbw()`? VPMOVB2M requires AVX512VL for 128-/256-bit cases. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3726: > 3724: > 3725: #ifdef _LP64 > 3726: void C2_MacroAssembler::vector_mask_oper(int opc, Register dst, XMMRegister mask, XMMRegister xtmp, What about stressing that it requires AVX512BW & VL extensions? For example, by putting an assert and adding `_evex` suffix to the name. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3727: > 3725: #ifdef _LP64 > 3726: void C2_MacroAssembler::vector_mask_oper(int opc, Register dst, XMMRegister mask, XMMRegister xtmp, > 3727: Register tmp, KRegister ktmp, int masklen, int vlen) { s/vlen/vlen_enc/ src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3749: > 3747: > 3748: void C2_MacroAssembler::vector_mask_oper(int opc, Register dst, XMMRegister mask, XMMRegister xtmp, > 3749: XMMRegister xtmp1, Register tmp, int masklen, int vlen) { s/vlen/vlen_enc/ src/hotspot/cpu/x86/x86.ad line 8066: > 8064: > 8065: instruct vmask_true_count_evex(rRegI dst, vec mask, rRegL tmp, kReg ktmp, vec xtmp) %{ > 8066: predicate(VM_Version::supports_avx512bw()); Same here: `VM_Version::supports_avx512vlbw()`? src/hotspot/cpu/x86/x86.ad line 8078: > 8076: int opcode = this->ideal_Opcode(); > 8077: int mask_len = mask_node->bottom_type()->is_vect()->length(); > 8078: __ vector_mask_oper(opcode, $dst$$Register, $mask$$XMMRegister, $xtmp$$XMMRegister, `oper` looks misleading to me here: it usually means `operand` in Mach-related code. Either `vector_mask_operation()` or `vector_mask_op()` is a better alternative IMO. src/hotspot/cpu/x86/x86.ad line 8085: > 8083: > 8084: instruct vmask_true_count_avx(rRegI dst, vec mask, rRegL tmp, vec xtmp, vec xtmp1) %{ > 8085: predicate(!VM_Version::supports_avx512bw()); `VM_Version::supports_avx512vlbw()` src/hotspot/share/opto/vectornode.cpp line 1297: > 1295: } > 1296: > 1297: Node* VectorMaskOpNode::Ideal(PhaseGVN* phase, bool can_reshape) { It doesn't make much sense to me. Why don't you simply require the input to be in canonical shape from the very beginning by unconditionally wrapping it into `VectorStoreMask` during construction? src/hotspot/share/opto/vectornode.hpp line 858: > 856: class VectorMaskOpNode : public TypeNode { > 857: public: > 858: VectorMaskOpNode(Node* mask, const Type* ty, const Type* ety, int mopc): `ty`/`ety` caught my eye. It doesn't match anything in vectornode.hpp and may confuse readers. Any reason not to use `vt`? Also, any particular reason to cache full-blown type instead of capturing just the `BasicType`? src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 469: > 467: public static > 468: > 469: int maskOp(int oper, Class maskClass, Class elemClass, int length, M m, I second Paul here: `maskOp` case is already covered by `reductionCoerced`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From aph at openjdk.java.net Fri May 14 13:53:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 14 May 2021 13:53:48 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: On Fri, 14 May 2021 13:15:45 GMT, Alan Hayward wrote: >> src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 607: >> >>> 605: const bool must_gc_arguments = true; >>> 606: const bool dont_gc_arguments = false; >>> 607: const bool does_not_return = true; >> >> I wonder if an `enum` would read better here. >> >> Something like >> >> >> enum may_return_t { >> does_not_return, may_return >> }; >> >> class StubFrame { >> >> StubFrame(StubAssembler* sasm, const char* name, bool must_gc_arguments, may_return_t may_return); >> >> }; > > Ok. I was keeping to the existing style of dont_gc_arguments/must_gc_arguments - but I could change those too so they match. This code is a legacy carried over from x86. It's not necessary to "match". Changing others would be unnecessary churn, and risk breakage. There's an urge from some contributors: when I suggest doing something in an easy-to-understand and clean way, people want to change everything else to match. This urge can be resisted, and IMVHO should be in this case. Churn is, in itself, bad. And this case, is special, I think, because `does_not_return` uses the "don't do" anti-pattern, where the `true` case was `does_not_return`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From github.com+4146708+a74nh at openjdk.java.net Fri May 14 15:24:37 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Fri, 14 May 2021 15:24:37 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: On Fri, 14 May 2021 13:50:40 GMT, Andrew Haley wrote: >> Ok. I was keeping to the existing style of dont_gc_arguments/must_gc_arguments - but I could change those too so they match. > > This code is a legacy carried over from x86. It's not necessary to "match". Changing others would be unnecessary churn, and risk breakage. > > There's an urge from some contributors: when I suggest doing something in an easy-to-understand and clean way, people want to change everything else to match. This urge can be resisted, and IMVHO should be in this case. Churn is, in itself, bad. > > And this case, is special, I think, because `does_not_return` uses the "don't do" anti-pattern, where the `true` case was `does_not_return`. That's fine, I'll leave that part as it. Knowing the motivation for this is useful, as every project is different. ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From psandoz at openjdk.java.net Fri May 14 15:29:38 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 14 May 2021 15:29:38 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Fri, 14 May 2021 11:26:29 GMT, Jatin Bhateja wrote: >> Thanks for the explanations on why partial inlining can be beneficial. Ideally it would be great if the only changes we made to the Java code were to the threshold values. >> >> For example: >> >> public static int mismatch(byte[] a, >> byte[] b, >> int length) { >> // ISSUE: defer to index receiving methods if performance is good >> // assert length <= a.length >> // assert length <= b.length >> >> int i = 0; >> if (length > BYTE_THRESHOLD) { >> if (a[0] != b[0]) >> return 0; >> i = vectorizedMismatch( >> a, Unsafe.ARRAY_BYTE_BASE_OFFSET, >> b, Unsafe.ARRAY_BYTE_BASE_OFFSET, >> length, LOG2_ARRAY_BYTE_INDEX_SCALE); >> if (i >= 0) >> return i; >> // Align to tail >> i = length - ~i; >> // assert i >= 0 && i <= 7; >> } >> // Tail < 8 bytes >> for (; i < length; i++) { >> if (a[i] != b[i]) >> return i; >> } >> return -1; >> } >> >> >> Where `BYTE_THRESHOLD` is initialized to 7 or 0, based on querying some HotSpot runtime property. When `BYTE_THRESHOLD == 0` i hope the `length > BYTE_THRESHOLD` check is strength reduced in many cases. >> >> That does leave the `i >= 0` check of the result from `vectorizedMismatch`, perhaps that also has some minor impact? However, maybe since you are doing partial inlining and you know that your `vectorizedMismatch` intrinsic never returns a -ve value you could elide that check? >> >> A quick experiment would be to apply your HotSpot changes and use the existing Java code, replacing the constant threshold values with 0. The we can carefully look at the code gen and perf results. > > Hi @PaulSandoz , I have reinstated the tail handling in java to avoid any impact on other targets. Update performance numbers still show gains for small comparison sized upto -XX:UsePartialInlineSize. Thus patch now does not changes existing java implementation of VectorizedMismatch. @jatin-bhateja that's good. Did performance numbers change after reverting the Java changes? Do you think it is worth experimenting by setting the threshold to zero when partial inlining is supported? Maybe partial inlining will help for, say, mismatching on arrays with a length of 7 or less bytes e.g. we could test quickly with mismatching for `byte`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Fri May 14 15:45:35 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 14 May 2021 15:45:35 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: References: Message-ID: On Fri, 14 May 2021 06:04:45 GMT, Xiaohong Gong wrote: > When creating the vector shuffle, the `"VectorLoadConstNode"` will be created to get an initial index vector. Before creating it, the compiler should check whether the current platform supports this opcode in case the jvm crashes with `"bad ad file"`. The compiler should finish the intrinsification and go back to the default java implementation if the backend doesn't support it. > > Tested tier1 and jdk::tier3. The fix makes perfect sense, but I'm curious why do we have `VectorLoadConst` in the first place. It exposes JVM support for the iota vector constant materialization, but it's not clear to me what benefits it brings compared to feeding the intrinsic with the vector materialized on JDK side. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4023 From kvn at openjdk.java.net Fri May 14 16:14:57 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 14 May 2021 16:14:57 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v3] In-Reply-To: References: Message-ID: On Wed, 28 Apr 2021 06:43:19 GMT, Yi Yang wrote: >> It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > rename; redundant reloading test/hotspot/jtreg/compiler/c1/CanonicalizeGetModifiers.java line 31: > 29: * @requires vm.compiler1.enabled > 30: * @library /test/lib > 31: * @run main/othervm -XX:TieredStopAtLevel=1 -Xbatch I would suggest to add 2 additional `@run` command to make sure test passed in all modes: - default: without `-XX:TieredStopAtLevel` - C2 only: with `-XX:-TieredCompilation` ------------- PR: https://git.openjdk.java.net/jdk/pull/3616 From vlivanov at openjdk.java.net Fri May 14 16:32:46 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 14 May 2021 16:32:46 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: References: Message-ID: On Wed, 12 May 2021 05:33:14 GMT, David Holmes wrote: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David Overall, it looks very good. Thanks for taking care of compiler part, David. I think it makes sense to remove lir_div_strictfp and lir_mul_strictfp in C1 as well: https://github.com/openjdk/jdk/pull/4027 Feel free to incorporate the patch into the current PR if you agree with the change. (Passed hs-tier1 - hs-tier4 testing and x86_32 build.) Otherwise, I'll handle it as a separate PR. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3991 From aph at openjdk.java.net Fri May 14 17:03:56 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 14 May 2021 17:03:56 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: On Fri, 14 May 2021 15:21:50 GMT, Alan Hayward wrote: >> This code is a legacy carried over from x86. It's not necessary to "match". Changing others would be unnecessary churn, and risk breakage. >> >> There's an urge from some contributors: when I suggest doing something in an easy-to-understand and clean way, people want to change everything else to match. This urge can be resisted, and IMVHO should be in this case. Churn is, in itself, bad. >> >> And this case, is special, I think, because `does_not_return` uses the "don't do" anti-pattern, where the `true` case was `does_not_return`. > > That's fine, I'll leave that part as it. Knowing the motivation for this is useful, as every project is different. Sure, thanks. The commonality between this port and x86 means that in many cases we have taken x86 patches and applied them to this port, with a few tweaks. Of course the ports diverge over time, but even after almost ten years it still sometimes works. ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From whuang at openjdk.java.net Fri May 14 17:19:42 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 14 May 2021 17:19:42 GMT Subject: Integrated: 8263006: Add optimization for Max(*)Node and Min(*)Node In-Reply-To: References: Message-ID: On Thu, 15 Apr 2021 11:32:36 GMT, Wang Huang wrote: > * I optimize `max` and `min` by using these identities > - op (max(a,b) , min(a,b))=== op(a,b) > - if op is commutable > - example : > - max(a,b) + min(a,b))=== a + b // op = add > - max(a,b) * min(a,b))=== a * b // op = mul > - max( max(a,b) , min(a,b)))=== max(a,b) // op = max() > - min( max(a,b) , min(a,b)))=== max(a,b) // op = min() > * Test case > ```java > /* > * Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > * under the terms of the GNU General Public License version 2 only, as > * published by the Free Software Foundation. > * > * This code is distributed in the hope that it will be useful, but WITHOUT > * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > * version 2 for more details (a copy is included in the LICENSE file that > * accompanied this code). > * > * You should have received a copy of the GNU General Public License version > * 2 along with this work; if not, write to the Free Software Foundation, > * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. > * > * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA > * or visit www.oracle.com if you need additional information or have any > * questions. > */ > package org.sample; > > import org.openjdk.jmh.annotations.Benchmark; > import org.openjdk.jmh.annotations.*; > > import java.util.Random; > import java.util.concurrent.TimeUnit; > import org.openjdk.jmh.infra.Blackhole; > > @BenchmarkMode({Mode.AverageTime}) > @OutputTimeUnit(TimeUnit.MICROSECONDS) > public class MyBenchmark { > > static int length = 100000; > static double[] data1 = new double[length]; > static double[] data2 = new double[length]; > static Random random = new Random(); > > static { > for(int i = 0; i < length; ++i) { > data1[i] = random.nextDouble(); > data2[i] = random.nextDouble(); > } > } > > @Benchmark > public void testAdd(Blackhole bh) { > double sum = 0; > for (int i = 0; i < length; i++) { > sum += Math.max(data1[i], data2[i]) + Math.min(data1[i], data2[i]); > } > bh.consume(sum); > } > > @Benchmark > public void testMax(Blackhole bh) { > double sum = 0; > for (int i = 0; i < length; i++) { > sum += Math.max(Math.max(data1[i], data2[i]), Math.min(data1[i], data2[i])); > } > bh.consume(sum); > } > > @Benchmark > public void testMin(Blackhole bh) { > double sum = 0; > for (int i = 0; i < length; i++) { > sum += Math.min(Math.max(data1[i], data2[i]), Math.min(data1[i], data2[i])); > } > bh.consume(sum); > } > > @Benchmark > public void testMul(Blackhole bh) { > double sum = 0; > for (int i = 0; i < length; i++) { > sum += (Math.max(data1[i], data2[i]) * Math.min(data1[i], data2[i])); > } > bh.consume(sum); > } > } > ``` > > * The result is listed here (aarch64): > > before: > > |Benchmark| Mode| Samples| Score| Score error| Units| > |---| ---| ---| ---| --- | ---| > |o.s.MyBenchmark.testAdd |avgt | 10 | 556.048 | 32.368 | us/op | > | o.s.MyBenchmark.testMax | avgt | 10 |543.065 | 54.221 | us/op | > | o.s.MyBenchmark.testMin | avgt |10 |570.731 | 37.630 | us/op | > | o.s.MyBenchmark.testMul | avgt | 10 | 531.906 | 20.518 | us/op | > > after: > > |Benchmark| Mode| Samples| Score| Score error| Units| > |---| ---| ---| ---| --- | ---| > | o.s.MyBenchmark.testAdd | avgt | 10 | 319.350 | 9.248 | us/op | > | o.s.MyBenchmark.testMax | avgt | 10 | 356.138 | 10.736 | us/op | > | o.s.MyBenchmark.testMin | avgt | 10 | 323.731 | 16.621 | us/op | > | o.s.MyBenchmark.testMul | avgt | 10 | 338.458 | 23.755 | us/op | > > * I have tested `NaN` ` INFINITY` and `-INFINITY` and got same result (before/after) This pull request has now been integrated. Changeset: 599d07c0 Author: Wang Huang Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/599d07c0db9c85e4dae35d1c54a63407d32eaedd Stats: 477 lines in 6 files changed: 467 ins; 4 del; 6 mod 8263006: Add optimization for Max(*)Node and Min(*)Node Co-authored-by: Wang Huang Co-authored-by: Wu Yan Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3513 From kvn at openjdk.java.net Fri May 14 17:51:18 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 14 May 2021 17:51:18 GMT Subject: RFR: 8267112: JVMCI compiler modules should be kept upgradable Message-ID: [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. I restored related code in some tests for them to pass. Testing: full tier1-tier3. ------------- Commit messages: - Fix tests - 8267112: Graal modules should be kept upgradable Changes: https://git.openjdk.java.net/jdk/pull/4014/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4014&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267112 Stats: 83 lines in 9 files changed: 34 ins; 42 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4014.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4014/head:pull/4014 PR: https://git.openjdk.java.net/jdk/pull/4014 From dnsimon at openjdk.java.net Fri May 14 17:51:18 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 14 May 2021 17:51:18 GMT Subject: RFR: 8267112: JVMCI compiler modules should be kept upgradable In-Reply-To: References: Message-ID: On Thu, 13 May 2021 16:37:38 GMT, Vladimir Kozlov wrote: > [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. > > Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. > > Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. > > Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. > I restored related code in some tests for them to pass. > > Testing: full tier1-tier3. Looks good. My only suggestion would have been to avoid mentioning "Graal" in the PR title and description. This is really about making `jdk.internal.vm.compiler` a placeholder module for any JVMCI based compiler, of which Graal is one example. ------------- PR: https://git.openjdk.java.net/jdk/pull/4014 From jbhateja at openjdk.java.net Fri May 14 18:02:37 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 18:02:37 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Fri, 14 May 2021 11:26:29 GMT, Jatin Bhateja wrote: >> Thanks for the explanations on why partial inlining can be beneficial. Ideally it would be great if the only changes we made to the Java code were to the threshold values. >> >> For example: >> >> public static int mismatch(byte[] a, >> byte[] b, >> int length) { >> // ISSUE: defer to index receiving methods if performance is good >> // assert length <= a.length >> // assert length <= b.length >> >> int i = 0; >> if (length > BYTE_THRESHOLD) { >> if (a[0] != b[0]) >> return 0; >> i = vectorizedMismatch( >> a, Unsafe.ARRAY_BYTE_BASE_OFFSET, >> b, Unsafe.ARRAY_BYTE_BASE_OFFSET, >> length, LOG2_ARRAY_BYTE_INDEX_SCALE); >> if (i >= 0) >> return i; >> // Align to tail >> i = length - ~i; >> // assert i >= 0 && i <= 7; >> } >> // Tail < 8 bytes >> for (; i < length; i++) { >> if (a[i] != b[i]) >> return i; >> } >> return -1; >> } >> >> >> Where `BYTE_THRESHOLD` is initialized to 7 or 0, based on querying some HotSpot runtime property. When `BYTE_THRESHOLD == 0` i hope the `length > BYTE_THRESHOLD` check is strength reduced in many cases. >> >> That does leave the `i >= 0` check of the result from `vectorizedMismatch`, perhaps that also has some minor impact? However, maybe since you are doing partial inlining and you know that your `vectorizedMismatch` intrinsic never returns a -ve value you could elide that check? >> >> A quick experiment would be to apply your HotSpot changes and use the existing Java code, replacing the constant threshold values with 0. The we can carefully look at the code gen and perf results. > > Hi @PaulSandoz , I have reinstated the tail handling in java to avoid any impact on other targets. Update performance numbers still show gains for small comparison sized upto -XX:UsePartialInlineSize. Thus patch now does not changes existing java implementation of VectorizedMismatch. > @jatin-bhateja that's good. Did performance numbers change after reverting the Java changes? > Yes, there is around 5% variation. Benchmark | Array Length | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain -- | -- | -- | -- | -- | -- | -- ArraysMismatch.Byte.differentSubrangeMatches | 16 | 128575.325 | 144855.224 | 1.126617599 | 135549.783 | 1.054244141 ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125000.325 | 140417.725 | 1.123338879 | 134738.852 | 1.077908013 ArraysMismatch.Byte.differentSubrangeMatches | 64 | 122121.658 | 141980.289 | 1.162613506 | 135276.983 | 1.107723112 ArraysMismatch.Byte.differentSubrangeMatches | 90 | 89949.502 | 82139.615 | 0.913174761 | 124708.875 | 1.386432078 ArraysMismatch.Byte.differentSubrangeMatches | 800 | 59648.979 | 64744.783 | 1.085429861 | 66158.56 | 1.109131474 ArraysMismatch.Byte.matches | 16 | 162496.975 | 178905.62 | 1.100978157 | 168902.064 | 1.039416666 ArraysMismatch.Byte.matches | 32 | 149555.964 | 178809.802 | 1.195604623 | 168173.504 | 1.12448544 ArraysMismatch.Byte.matches | 64 | 138302.297 | 120508.305 | 0.871339866 | 168874.573 | 1.221054 ArraysMismatch.Byte.matches | 90 | 102398.54 | 97189.139 | 0.949126218 | 99180.606 | 0.968574415 ArraysMismatch.Byte.matches | 800 | 50774.834 | 46342.805 | 0.912712093 | 46519.1 | 0.916184187 ArraysMismatch.Byte.mismatchEnd | 16 | 150496.869 | 193526.12 | 1.285914593 | 183319.104 | 1.218092477 ArraysMismatch.Byte.mismatchEnd | 32 | 151782.112 | 193680.387 | 1.276042245 | 183753.905 | 1.210642694 ArraysMismatch.Byte.mismatchEnd | 64 | 140578.852 | 122157.726 | 0.868962324 | 183616.047 | 1.30614274 ArraysMismatch.Byte.mismatchEnd | 90 | 117184.096 | 104009.41 | 0.887572747 | 104198.932 | 0.889190048 ArraysMismatch.Byte.mismatchEnd | 800 | 47585.021 | 47742.853 | 1.003316842 | 47694.384 | 1.002298265 ArraysMismatch.Byte.mismatchMid | 16 | 162373.338 | 198711.462 | 1.223793662 | 183324.738 | 1.129032268 ArraysMismatch.Byte.mismatchMid | 32 | 151647.714 | 193679.657 | 1.277168326 | 183030.74 | 1.206946911 ArraysMismatch.Byte.mismatchMid | 64 | 141058.854 | 121500.09 | 0.861343238 | 183278.428 | 1.299304672 ArraysMismatch.Byte.mismatchMid | 90 | 140839.572 | 122118.754 | 0.867077003 | 121896.873 | 0.865501586 ArraysMismatch.Byte.mismatchMid | 800 | 65953.822 | 65120.08 | 0.987358701 | 70891.518 | 1.074865957 ArraysMismatch.Byte.mismatchStart | 16 | 162334.573 | 193670.485 | 1.193032891 | 183351.662 | 1.129467732 ArraysMismatch.Byte.mismatchStart | 32 | 151668.759 | 198425.832 | 1.30828414 | 181890.834 | 1.19926368 ArraysMismatch.Byte.mismatchStart | 64 | 151644.763 | 128241.079 | 0.845667707 | 183270.344 | 1.208550433 ArraysMismatch.Byte.mismatchStart | 90 | 151565.239 | 128628.061 | 0.848664653 | 129050.748 | 0.851453466 ArraysMismatch.Byte.mismatchStart | 800 | 149279.868 | 129676.597 | 0.86868108 | 129644.754 | 0.86846777 ArraysMismatch.Char.differentSubrangeMatches | 16 | 125066.795 | 134560.133 | 1.075906143 | 128522.659 | 1.027632146 ArraysMismatch.Char.differentSubrangeMatches | 32 | 118622.375 | 135174.281 | 1.139534434 | 129247.294 | 1.089569265 ArraysMismatch.Char.differentSubrangeMatches | 64 | 110989.736 | 101324.562 | 0.912918308 | 127740.841 | 1.150924812 ArraysMismatch.Char.differentSubrangeMatches | 90 | 88158.505 | 86103.35 | 0.976687955 | 84913.129 | 0.963187035 ArraysMismatch.Char.differentSubrangeMatches | 800 | 44786.047 | 44888.007 | 1.002276602 | 45266.381 | 1.010725081 ArraysMismatch.Char.matches | 16 | 150449.365 | 180300.26 | 1.198411572 | 167831.799 | 1.115536772 ArraysMismatch.Char.matches | 32 | 137508.243 | 121613.2 | 0.884406617 | 168529.38 | 1.225594745 ArraysMismatch.Char.matches | 64 | 111238.281 | 104451.169 | 0.938985824 | 104456.106 | 0.939030207 ArraysMismatch.Char.matches | 90 | 89576.98 | 82706.461 | 0.923300395 | 82094.852 | 0.916472647 ArraysMismatch.Char.matches | 800 | 25890.552 | 25076.19 | 0.968545978 | 25175.427 | 0.97237892 ArraysMismatch.Char.mismatchEnd | 16 | 148744.669 | 193735.827 | 1.302472407 | 182679.165 | 1.228139242 ArraysMismatch.Char.mismatchEnd | 32 | 139790.505 | 120976.307 | 0.865411474 | 182651.879 | 1.306611483 ArraysMismatch.Char.mismatchEnd | 64 | 115203.826 | 105308.171 | 0.91410307 | 104283.277 | 0.905206716 ArraysMismatch.Char.mismatchEnd | 90 | 85344.044 | 88961.211 | 1.042383356 | 88986.943 | 1.042684865 ArraysMismatch.Char.mismatchEnd | 800 | 21198.514 | 22762.467 | 1.073776539 | 20532.903 | 0.968601054 ArraysMismatch.Char.mismatchMid | 16 | 148694.307 | 193547.037 | 1.301643895 | 182739.927 | 1.228963843 ArraysMismatch.Char.mismatchMid | 32 | 131819.638 | 120542.455 | 0.914449902 | 182522.226 | 1.384636074 ArraysMismatch.Char.mismatchMid | 64 | 122303.688 | 112374.874 | 0.91881836 | 112962.46 | 0.923622679 ArraysMismatch.Char.mismatchMid | 90 | 119193.595 | 110435.962 | 0.926525977 | 112841.238 | 0.946705551 ArraysMismatch.Char.mismatchMid | 800 | 50811.151 | 48327.149 | 0.951113054 | 43349.205 | 0.853143535 ArraysMismatch.Char.mismatchStart | 16 | 148954.747 | 196332.899 | 1.318070776 | 182895.579 | 1.227860022 ArraysMismatch.Char.mismatchStart | 32 | 140350.687 | 128903.712 | 0.918440193 | 182417.843 | 1.299728893 ArraysMismatch.Char.mismatchStart | 64 | 148661.774 | 128942.646 | 0.86735576 | 128923.292 | 0.867225572 ArraysMismatch.Char.mismatchStart | 90 | 149813.17 | 128907.497 | 0.860455039 | 128962.22 | 0.860820314 ArraysMismatch.Char.mismatchStart | 800 | 152547.918 | 128763.275 | 0.844084119 | 128908.197 | 0.845034129 ArraysMismatch.Double.differentSubrangeMatches | 16 | 108995.329 | 116181.518 | 1.065931165 | 115518.042 | 1.059843968 ArraysMismatch.Double.differentSubrangeMatches | 32 | 92067.783 | 97014.234 | 1.053726188 | 96970.016 | 1.053245911 ArraysMismatch.Double.differentSubrangeMatches | 64 | 78196.352 | 78152.45 | 0.999438567 | 78147.974 | 0.999381327 ArraysMismatch.Double.differentSubrangeMatches | 90 | 61344.251 | 68694.658 | 1.119822263 | 68766.112 | 1.120987067 ArraysMismatch.Double.differentSubrangeMatches | 800 | 14944.82 | 15219.863 | 1.018403902 | 15218.313 | 1.018300187 ArraysMismatch.Double.matches | 16 | 119314.736 | 120298.817 | 1.008247774 | 120356.273 | 1.008729324 ArraysMismatch.Double.matches | 32 | 84908.095 | 88858.203 | 1.04652216 | 88826.453 | 1.046148227 ArraysMismatch.Double.matches | 64 | 52849.331 | 63069.589 | 1.193384813 | 63076.972 | 1.193524512 ArraysMismatch.Double.matches | 90 | 50500.323 | 50214.691 | 0.994343957 | 50508.649 | 1.00016487 ArraysMismatch.Double.matches | 800 | 8825.189 | 8843.06 | 1.002024999 | 8848.091 | 1.002595072 ArraysMismatch.Double.mismatchEnd | 16 | 116518.598 | 119160.345 | 1.022672321 | 119417.269 | 1.024877325 ArraysMismatch.Double.mismatchEnd | 32 | 86686.542 | 86737.967 | 1.000593229 | 86245.95 | 0.994917412 ArraysMismatch.Double.mismatchEnd | 64 | 62844.082 | 62865.51 | 1.000340971 | 62824.22 | 0.999683948 ArraysMismatch.Double.mismatchEnd | 90 | 46811.941 | 47682.209 | 1.018590727 | 47683.524 | 1.018618818 ArraysMismatch.Double.mismatchEnd | 800 | 8123.098 | 8154.869 | 1.003911193 | 8122.968 | 0.999983996 ArraysMismatch.Double.mismatchMid | 16 | 113774.491 | 112407.289 | 0.987983229 | 113347.865 | 0.996250249 ArraysMismatch.Double.mismatchMid | 32 | 93878.771 | 93191.093 | 0.99267483 | 93148.916 | 0.992225559 ArraysMismatch.Double.mismatchMid | 64 | 73891.531 | 75278.408 | 1.018769093 | 76375.521 | 1.033616708 ArraysMismatch.Double.mismatchMid | 90 | 71601.957 | 71609.917 | 1.00011117 | 71185.73 | 0.994186933 ArraysMismatch.Double.mismatchMid | 800 | 12585.323 | 12687.35 | 1.008106824 | 12684.958 | 1.007916761 ArraysMismatch.Double.mismatchStart | 16 | 141340.973 | 139820.577 | 0.989243063 | 141687.838 | 1.002454101 ArraysMismatch.Double.mismatchStart | 32 | 141778.446 | 139839.172 | 0.9863218 | 141793.558 | 1.000106589 ArraysMismatch.Double.mismatchStart | 64 | 141447.19 | 139777.342 | 0.988194548 | 141735.624 | 1.002039164 ArraysMismatch.Double.mismatchStart | 90 | 141672.66 | 139952.848 | 0.987860664 | 141435.432 | 0.99832552 ArraysMismatch.Double.mismatchStart | 800 | 129247.638 | 139490.678 | 1.079251274 | 144563.768 | 1.118502204 ArraysMismatch.Float.differentSubrangeMatches | 16 | 110910.016 | 120022.108 | 1.082157521 | 119996.294 | 1.081924774 ArraysMismatch.Float.differentSubrangeMatches | 32 | 111486.226 | 111770.346 | 1.002548476 | 110773.578 | 0.993607748 ArraysMismatch.Float.differentSubrangeMatches | 64 | 85109.196 | 93874.459 | 1.102988436 | 93800.272 | 1.102116768 ArraysMismatch.Float.differentSubrangeMatches | 90 | 76165.301 | 76150.992 | 0.999812132 | 76218.427 | 1.000697509 ArraysMismatch.Float.differentSubrangeMatches | 800 | 24344.012 | 24945.968 | 1.024727066 | 24975.996 | 1.025960552 ArraysMismatch.Float.matches | 16 | 133246.909 | 139879.808 | 1.049779008 | 139300.513 | 1.045431478 ArraysMismatch.Float.matches | 32 | 115453.05 | 116866.329 | 1.012241158 | 116930.525 | 1.012797193 ArraysMismatch.Float.matches | 64 | 86868.384 | 87499.944 | 1.007270309 | 87378.551 | 1.005872873 ArraysMismatch.Float.matches | 90 | 69382.063 | 69320.95 | 0.999119182 | 69270.638 | 0.998394037 ArraysMismatch.Float.matches | 800 | 15797.436 | 15305.406 | 0.968853806 | 15368.778 | 0.972865343 ArraysMismatch.Float.mismatchEnd | 16 | 128972.145 | 127558.472 | 0.989038928 | 126943.523 | 0.984270852 ArraysMismatch.Float.mismatchEnd | 32 | 99500.703 | 106165.752 | 1.066984944 | 104714.992 | 1.052404544 ArraysMismatch.Float.mismatchEnd | 64 | 85579.522 | 84530.586 | 0.987743143 | 84477.75 | 0.987125752 ArraysMismatch.Float.mismatchEnd | 90 | 71330.733 | 76663.363 | 1.074759221 | 71542.67 | 1.002971188 ArraysMismatch.Float.mismatchEnd | 800 | 12684.13 | 12712.423 | 1.002230583 | 14291.866 | 1.126751776 ArraysMismatch.Float.mismatchMid | 16 | 119900.83 | 124084.55 | 1.03489317 | 124324.212 | 1.036892005 ArraysMismatch.Float.mismatchMid | 32 | 112489.957 | 111460.099 | 0.990844889 | 112307.057 | 0.998374077 ArraysMismatch.Float.mismatchMid | 64 | 93700.093 | 93598.863 | 0.998919638 | 93964.963 | 1.002826785 ArraysMismatch.Float.mismatchMid | 90 | 89995.813 | 92882.423 | 1.032074937 | 90128.506 | 1.001474435 ArraysMismatch.Float.mismatchMid | 800 | 20683.982 | 20964.562 | 1.013565086 | 20912.721 | 1.011058751 ArraysMismatch.Float.mismatchStart | 16 | 140865.868 | 140832.899 | 0.999765955 | 140179.035 | 0.995124206 ArraysMismatch.Float.mismatchStart | 32 | 140963.807 | 141210.069 | 1.001746987 | 141155.621 | 1.001360732 ArraysMismatch.Float.mismatchStart | 64 | 128036.089 | 140209.323 | 1.095076584 | 141165.264 | 1.102542768 ArraysMismatch.Float.mismatchStart | 90 | 139812.729 | 143867.129 | 1.02899879 | 109229.788 | 0.781257821 ArraysMismatch.Float.mismatchStart | 800 | 143800.67 | 139801.737 | 0.972191138 | 143862.688 | 1.000431278 ArraysMismatch.Int.differentSubrangeMatches | 16 | 119963.836 | 119300.373 | 0.994469475 | 119968.86 | 1.000041879 ArraysMismatch.Int.differentSubrangeMatches | 32 | 110957.277 | 110224.844 | 0.993398964 | 111157.285 | 1.001802568 ArraysMismatch.Int.differentSubrangeMatches | 64 | 85378.666 | 93789.643 | 1.098513802 | 93743.202 | 1.097969861 ArraysMismatch.Int.differentSubrangeMatches | 90 | 76146.387 | 76201.059 | 1.000717985 | 76057.16 | 0.998828218 ArraysMismatch.Int.differentSubrangeMatches | 800 | 24761.361 | 24891.597 | 1.005259646 | 24716.676 | 0.998195374 ArraysMismatch.Int.matches | 16 | 137433.609 | 139858.37 | 1.017643144 | 138042.695 | 1.004431856 ArraysMismatch.Int.matches | 32 | 113947.437 | 117383.592 | 1.030155615 | 114140.779 | 1.001696765 ArraysMismatch.Int.matches | 64 | 83458.037 | 87402.272 | 1.047260098 | 87558.197 | 1.049128402 ArraysMismatch.Int.matches | 90 | 69359.801 | 69129.876 | 0.99668504 | 69202.217 | 0.997728021 ArraysMismatch.Int.matches | 800 | 15151.507 | 15245.003 | 1.006170739 | 15778.822 | 1.041402812 ArraysMismatch.Int.mismatchEnd | 16 | 137635.65 | 136617.544 | 0.99260289 | 136584.159 | 0.99236033 ArraysMismatch.Int.mismatchEnd | 32 | 114877.262 | 115193.044 | 1.002748864 | 115280.958 | 1.003514151 ArraysMismatch.Int.mismatchEnd | 64 | 86416.926 | 85360.499 | 0.987775231 | 85492.741 | 0.989305509 ArraysMismatch.Int.mismatchEnd | 90 | 73080.648 | 79063.102 | 1.081860987 | 73012.794 | 0.999071519 ArraysMismatch.Int.mismatchEnd | 800 | 14493.463 | 12788.852 | 0.882387598 | 14485.628 | 0.999459411 ArraysMismatch.Int.mismatchMid | 16 | 131819.961 | 135110.592 | 1.024963071 | 133456.021 | 1.012411322 ArraysMismatch.Int.mismatchMid | 32 | 121601.267 | 121323.539 | 0.997716076 | 121033.479 | 0.995330739 ArraysMismatch.Int.mismatchMid | 64 | 96305.313 | 98513.588 | 1.022929939 | 99524.464 | 1.033426515 ArraysMismatch.Int.mismatchMid | 90 | 93047.732 | 95701.604 | 1.028521619 | 95748.806 | 1.029028907 ArraysMismatch.Int.mismatchMid | 800 | 24763.584 | 24529.368 | 0.990541918 | 24566.322 | 0.99203419 ArraysMismatch.Int.mismatchStart | 16 | 140288.976 | 149381.016 | 1.064809369 | 148432.735 | 1.058049886 ArraysMismatch.Int.mismatchStart | 32 | 140299.038 | 149933.984 | 1.068674355 | 148666.253 | 1.059638435 ArraysMismatch.Int.mismatchStart | 64 | 140390.509 | 148516.304 | 1.057879945 | 149125.82 | 1.062221521 ArraysMismatch.Int.mismatchStart | 90 | 135683.215 | 149479.127 | 1.101677367 | 152544.074 | 1.124266358 ArraysMismatch.Int.mismatchStart | 800 | 152579.586 | 152534.551 | 0.999704843 | 149863.75 | 0.982200528 ArraysMismatch.Long.differentSubrangeMatches | 16 | 125375.624 | 123127.977 | 0.982072695 | 125570.502 | 1.001554353 ArraysMismatch.Long.differentSubrangeMatches | 32 | 100353.427 | 104527.284 | 1.041591574 | 100339.027 | 0.999856507 ArraysMismatch.Long.differentSubrangeMatches | 64 | 79732.381 | 80799.459 | 1.013383245 | 79755.01 | 1.000283812 ArraysMismatch.Long.differentSubrangeMatches | 90 | 70378.676 | 71253.509 | 1.01243037 | 70502.564 | 1.001760306 ArraysMismatch.Long.differentSubrangeMatches | 800 | 15229.105 | 15139.187 | 0.994095648 | 15172.463 | 0.996280674 ArraysMismatch.Long.matches | 16 | 119081.321 | 119487.306 | 1.003409309 | 119758.812 | 1.005689314 ArraysMismatch.Long.matches | 32 | 88599.37 | 88638.351 | 1.000439969 | 88576.011 | 0.999736353 ArraysMismatch.Long.matches | 64 | 58898.468 | 53095.514 | 0.901475298 | 62427.169 | 1.059911592 ArraysMismatch.Long.matches | 90 | 50386.116 | 50338.305 | 0.999051108 | 50562.903 | 1.003508645 ArraysMismatch.Long.matches | 800 | 8820.281 | 8529.311 | 0.967011255 | 8852.332 | 1.003633784 ArraysMismatch.Long.mismatchEnd | 16 | 125007.971 | 128210.129 | 1.025615631 | 127881.579 | 1.022987398 ArraysMismatch.Long.mismatchEnd | 32 | 83076.909 | 88860.258 | 1.069614398 | 90371.34 | 1.087803351 ArraysMismatch.Long.mismatchEnd | 64 | 64514.133 | 64481.669 | 0.999496792 | 64403.391 | 0.998283446 ArraysMismatch.Long.mismatchEnd | 90 | 46519.966 | 47637.256 | 1.024017429 | 47623.133 | 1.023713839 ArraysMismatch.Long.mismatchEnd | 800 | 8141.139 | 7196.482 | 0.883965008 | 7205.709 | 0.885098387 ArraysMismatch.Long.mismatchMid | 16 | 122535.55 | 122468.245 | 0.999450731 | 121420.652 | 0.990901432 ArraysMismatch.Long.mismatchMid | 32 | 97246.708 | 99410.056 | 1.022245977 | 99415.553 | 1.022302503 ArraysMismatch.Long.mismatchMid | 64 | 78567.11 | 76615.257 | 0.975156869 | 76589.915 | 0.974834317 ArraysMismatch.Long.mismatchMid | 90 | 72329.55 | 74842.274 | 1.034739937 | 73309.119 | 1.013543137 ArraysMismatch.Long.mismatchMid | 800 | 15203.544 | 12744.161 | 0.838236203 | 15195.833 | 0.999492816 ArraysMismatch.Long.mismatchStart | 16 | 149836.786 | 149828.664 | 0.999945794 | 149875.077 | 1.000255551 ArraysMismatch.Long.mismatchStart | 32 | 149794.513 | 147781.365 | 0.986560603 | 147783.146 | 0.986572492 ArraysMismatch.Long.mismatchStart | 64 | 150066.12 | 147763.313 | 0.984654718 | 147705.026 | 0.984266309 ArraysMismatch.Long.mismatchStart | 90 | 149769.076 | 149930.834 | 1.001080049 | 147765.191 | 0.986620169 ArraysMismatch.Long.mismatchStart | 800 | 153228.897 | 153291.369 | 1.000407704 | 147317.909 | 0.961423804 ArraysMismatch.Short.differentSubrangeMatches | 16 | 124165.057 | 135373.754 | 1.090272556 | 128483.318 | 1.034778392 ArraysMismatch.Short.differentSubrangeMatches | 32 | 108467.591 | 139665.229 | 1.287621747 | 129280.478 | 1.191881158 ArraysMismatch.Short.differentSubrangeMatches | 64 | 101789.936 | 101026.787 | 0.992502707 | 128695.583 | 1.264325218 ArraysMismatch.Short.differentSubrangeMatches | 90 | 79113.558 | 86119.991 | 1.088561723 | 84812.205 | 1.072031231 ArraysMismatch.Short.differentSubrangeMatches | 800 | 44717.401 | 44941.053 | 1.005001453 | 45136.069 | 1.00936253 ArraysMismatch.Short.matches | 16 | 150437.283 | 180578.877 | 1.200359867 | 167914.258 | 1.116174492 ArraysMismatch.Short.matches | 32 | 132788.255 | 121258.183 | 0.913169489 | 168523.439 | 1.269114042 ArraysMismatch.Short.matches | 64 | 111256.816 | 104483.87 | 0.939123316 | 104146.178 | 0.936088069 ArraysMismatch.Short.matches | 90 | 89629.331 | 82256.3 | 0.917738636 | 82254.873 | 0.917722715 ArraysMismatch.Short.matches | 800 | 25875.154 | 25149.698 | 0.97196322 | 25567.552 | 0.988112071 ArraysMismatch.Short.mismatchEnd | 16 | 148537.517 | 193718.849 | 1.304174547 | 182495.003 | 1.228612183 ArraysMismatch.Short.mismatchEnd | 32 | 139814.385 | 120797.789 | 0.863986842 | 182574.092 | 1.305831957 ArraysMismatch.Short.mismatchEnd | 64 | 114830.689 | 105372.717 | 0.917635502 | 102450.982 | 0.892191651 ArraysMismatch.Short.mismatchEnd | 90 | 89778.028 | 88480.375 | 0.985545985 | 88414.905 | 0.984816742 ArraysMismatch.Short.mismatchEnd | 800 | 21231.162 | 22720.746 | 1.070160267 | 22762.858 | 1.072143767 ArraysMismatch.Short.mismatchMid | 16 | 148684.681 | 192816.173 | 1.296812635 | 182828.042 | 1.229636038 ArraysMismatch.Short.mismatchMid | 32 | 133351.494 | 121165.274 | 0.908615797 | 182578.799 | 1.369154507 ArraysMismatch.Short.mismatchMid | 64 | 122310.455 | 111705.87 | 0.913297804 | 112499.229 | 0.919784241 ArraysMismatch.Short.mismatchMid | 90 | 125078.209 | 110292.292 | 0.881786627 | 112900.776 | 0.902641451 ArraysMismatch.Short.mismatchMid | 800 | 50870.426 | 43223.723 | 0.849682741 | 43290.85 | 0.851002309 ArraysMismatch.Short.mismatchStart | 16 | 148721.59 | 192829.361 | 1.296579474 | 182664.555 | 1.22823159 ArraysMismatch.Short.mismatchStart | 32 | 148455.699 | 128768.914 | 0.867389496 | 183430.126 | 1.235588308 ArraysMismatch.Short.mismatchStart | 64 | 148574.311 | 127928.852 | 0.861042876 | 128884.409 | 0.867474385 ArraysMismatch.Short.mismatchStart | 90 | 149658.37 | 128839.609 | 0.860891436 | 128868.164 | 0.861082237 ArraysMismatch.Short.mismatchStart | 800 | 152433.566 | 127994.372 | 0.839673147 | 127981.695 | 0.839589982 > Do you think it is worth experimenting by setting the threshold to zero when partial inlining is supported? Maybe partial inlining will help for, say, mismatching on arrays with a length of 7 or less bytes e.g. we could test quickly with mismatching for `byte`. Generally results looks fine in most cases there is some panelty though, just curious if you can kindly elaborate how can we do a target specific check in java side. could not locate a relevant Java public or JDK internal API for the same. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From rraghavan at openjdk.java.net Fri May 14 18:02:08 2021 From: rraghavan at openjdk.java.net (Rahul Raghavan) Date: Fri, 14 May 2021 18:02:08 GMT Subject: RFR: 8263252: Improve fold_compares c2 optimizations Message-ID: <5nBWLT-3wKd8_-5XaJ2sdaKPtQ0lbri2rfVaLbI796g=.9c0ef1f4-c77f-4955-ba39-2d2ec180a605@github.com> 8263252: Improve fold_compares c2 optimizations ------------- Commit messages: - Improve fold_compares c2 optimizations Changes: https://git.openjdk.java.net/jdk/pull/4035/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4035&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8263252 Stats: 224 lines in 2 files changed: 161 ins; 23 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/4035.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4035/head:pull/4035 PR: https://git.openjdk.java.net/jdk/pull/4035 From mchung at openjdk.java.net Fri May 14 18:09:35 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Fri, 14 May 2021 18:09:35 GMT Subject: RFR: 8267112: JVMCI compiler modules should be kept upgradable In-Reply-To: References: Message-ID: On Thu, 13 May 2021 16:37:38 GMT, Vladimir Kozlov wrote: > [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. > > Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. > > Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. > > Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. > I restored related code in some tests for them to pass. > > Testing: full tier1-tier3. Marked as reviewed by mchung (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4014 From rraghavan at openjdk.java.net Fri May 14 18:28:37 2021 From: rraghavan at openjdk.java.net (Rahul Raghavan) Date: Fri, 14 May 2021 18:28:37 GMT Subject: RFR: 8263252: Improve fold_compares c2 optimizations In-Reply-To: <5nBWLT-3wKd8_-5XaJ2sdaKPtQ0lbri2rfVaLbI796g=.9c0ef1f4-c77f-4955-ba39-2d2ec180a605@github.com> References: <5nBWLT-3wKd8_-5XaJ2sdaKPtQ0lbri2rfVaLbI796g=.9c0ef1f4-c77f-4955-ba39-2d2ec180a605@github.com> Message-ID: <7-OYeGmKcrn0y2pd7ecG5DjPrcitQIS_LZd6ydDRGyg=.52fbdaa9-7234-497c-a49d-334b7f14e7fa@github.com> On Fri, 14 May 2021 17:53:20 GMT, Rahul Raghavan wrote: > Started from last Roland reviewed changes in JDK-8238812 > https://openjdk.github.io/cr/?repo=jdk&pr=2758&range=01 Please note this is not ready for open review yet. Converted to draft PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/4035 From sviswanathan at openjdk.java.net Fri May 14 18:37:25 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 14 May 2021 18:37:25 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v7] In-Reply-To: References: Message-ID: <9xgNS8A7za3iluvvsT_7BTrGCncCWQfXnP8pE-yATE4=.4f453320-d8b4-43f8-92e9-455ea97cbe23@github.com> On Fri, 7 May 2021 16:42:30 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add @run case @vnkozlov Could you please review and approve this PR if it looks ok to you? ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From jbhateja at openjdk.java.net Fri May 14 18:58:38 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 18:58:38 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Fri, 14 May 2021 15:26:47 GMT, Paul Sandoz wrote: >> Hi @PaulSandoz , I have reinstated the tail handling in java to avoid any impact on other targets. Update performance numbers still show gains for small comparison sized upto -XX:UsePartialInlineSize. Thus patch now does not changes existing java implementation of VectorizedMismatch. > > @jatin-bhateja that's good. Did performance numbers change after reverting the Java changes? > > Do you think it is worth experimenting by setting the threshold to zero when partial inlining is supported? Maybe partial inlining will help for, say, mismatching on arrays with a length of 7 or less bytes e.g. we could test quickly with mismatching for `byte`. Hi @PaulSandoz , after removal of java side changes, I still see good gains for small sizes but there is considerable penalty. Will set the threshold to 0, and re-compute the numbers, seek your inputs on adding target specific THRESHOLD. Could not locate any direct public java API or internal jdk API which could be used to fetch target information. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Fri May 14 19:32:10 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 19:32:10 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v2] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 14 May 2021 12:48:58 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8256973: Review comments resolution. > > src/hotspot/cpu/x86/x86.ad line 8085: > >> 8083: >> 8084: instruct vmask_true_count_avx(rRegI dst, vec mask, rRegL tmp, vec xtmp, vec xtmp1) %{ >> 8085: predicate(!VM_Version::supports_avx512bw()); > > `VM_Version::supports_avx512vlbw()` Handled in match_rule_supported_vector. > src/hotspot/share/opto/vectornode.cpp line 1297: > >> 1295: } >> 1296: >> 1297: Node* VectorMaskOpNode::Ideal(PhaseGVN* phase, bool can_reshape) { > > It doesn't make much sense to me. Why don't you simply require the input to be in canonical shape from the very beginning by unconditionally wrapping it into `VectorStoreMask` during construction? Done > src/hotspot/share/opto/vectornode.hpp line 858: > >> 856: class VectorMaskOpNode : public TypeNode { >> 857: public: >> 858: VectorMaskOpNode(Node* mask, const Type* ty, const Type* ety, int mopc): > > `ty`/`ety` caught my eye. It doesn't match anything in vectornode.hpp and may confuse readers. > Any reason not to use `vt`? > > Also, any particular reason to cache full-blown type instead of capturing just the `BasicType`? In this case all the mask operations produce an integer value. Thus did not use vt, have removed ety since there its does have any direct use currently. > src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 469: > >> 467: public static >> 468: >> 469: int maskOp(int oper, Class maskClass, Class elemClass, int length, M m, > > I second Paul here: `maskOp` case is already covered by `reductionCoerced`. As discussed above mixing it with reduction coerced will require changes in original entry point (type parameter have Vector as the lower bound) , also we may need to bypass some irrelevant portions in inline_vector_reduction() , for the time being to keep the things clean added a different entry point for all masked operations. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From psandoz at openjdk.java.net Fri May 14 19:44:31 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 14 May 2021 19:44:31 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Fri, 14 May 2021 18:55:44 GMT, Jatin Bhateja wrote: >> @jatin-bhateja that's good. Did performance numbers change after reverting the Java changes? >> >> Do you think it is worth experimenting by setting the threshold to zero when partial inlining is supported? Maybe partial inlining will help for, say, mismatching on arrays with a length of 7 or less bytes e.g. we could test quickly with mismatching for `byte`. > > Hi @PaulSandoz , after removal of java side changes, I still see good gains for small sizes but there is considerable penalty. > Will set the threshold to 0, and re-compute the numbers, seek your inputs on adding target specific THRESHOLD. Could not locate any direct public java API or internal jdk API which could be used to fetch target information. @jatin-bhateja glad the variation is small. If the subsequent results without and with a zero threshold for lengths below the current threshold show increased benefits i am sure we can find a way to surface up some detail. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From kvn at openjdk.java.net Fri May 14 19:44:51 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 14 May 2021 19:44:51 GMT Subject: RFR: 8267112: JVMCI compiler modules should be kept upgradable In-Reply-To: References: Message-ID: On Thu, 13 May 2021 16:37:38 GMT, Vladimir Kozlov wrote: > [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. > > Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. > > Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. > > Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. > I restored related code in some tests for them to pass. > > Testing: full tier1-tier3. Thank you, Mandy. ------------- PR: https://git.openjdk.java.net/jdk/pull/4014 From jbhateja at openjdk.java.net Fri May 14 19:59:22 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 14 May 2021 19:59:22 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v3] In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: <-fzRz23TCkf88P0zdw_6j12p1Nh7FbS3UijtmlGjeMI=.63d49d65-e943-473b-a6fc-5d48b2f235d6@github.com> > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8256973: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3916/files - new: https://git.openjdk.java.net/jdk/pull/3916/files/15e3ffd3..691d082c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=01-02 Stats: 64 lines in 7 files changed: 6 ins; 19 del; 39 mod Patch: https://git.openjdk.java.net/jdk/pull/3916.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3916/head:pull/3916 PR: https://git.openjdk.java.net/jdk/pull/3916 From kvn at openjdk.java.net Fri May 14 20:20:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 14 May 2021 20:20:38 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v7] In-Reply-To: References: Message-ID: <6U5EDcQli0NUlrhdOBtzkujogfwN_39V3bPs_7oIUX0=.e51f26d8-b503-4680-8d6e-34311a1b300f@github.com> On Fri, 7 May 2021 16:42:30 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add @run case Ping @pfustc about permission to add his JMH micro or write your own based on examples in `test/micro/org/openjdk/bench/java/util/` src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6799: > 6797: } > 6798: > 6799: if (VM_Version::supports_avx2() && UseAdler32Intrinsics) { Based on code in `vm_version_x86.cpp` `UseAdler32Intrinsics` is `true` only with `avx2`. So you don't need to check it here. src/hotspot/cpu/x86/vm_version_x86.cpp line 902: > 900: > 901: #ifdef _LP64 > 902: if (supports_avx2() && UseAdler32Intrinsics) { Check is incorrect. Should check only `if (supports_avx2()) ` test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java line 32: > 30: * > 31: * @run main/othervm/timeout=600 -Xbatch compiler.intrinsics.zip.TestAdler32 -m > 32: * @run main/othervm/timeout=600 -XX:+UnlockDiagnosticVMOptions -XX:+UseAdler32Intrinsics compiler.intrinsics.zip.TestAdler32 -m Enabling intrinsic unconditionally is incorrect - test will fail (gives waring) on machines which do not support avx2. Do not modify this tests. To test improvement JMH benchmark should be used. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3806 From vlivanov at openjdk.java.net Fri May 14 21:26:39 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 14 May 2021 21:26:39 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v3] In-Reply-To: <-fzRz23TCkf88P0zdw_6j12p1Nh7FbS3UijtmlGjeMI=.63d49d65-e943-473b-a6fc-5d48b2f235d6@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> <-fzRz23TCkf88P0zdw_6j12p1Nh7FbS3UijtmlGjeMI=.63d49d65-e943-473b-a6fc-5d48b2f235d6@github.com> Message-ID: On Fri, 14 May 2021 19:59:22 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Review comments resolution. src/hotspot/cpu/x86/x86.ad line 1660: > 1658: case Op_RotateLeftV: > 1659: case Op_MacroLogicV: > 1660: case Op_VectorMaskLastTrue: But don't you support 128-/256-bit cases w/ AVX/AVX2 instructions? This check effectively requires AVX512 as a baseline. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From vlivanov at openjdk.java.net Fri May 14 21:26:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 14 May 2021 21:26:40 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v2] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: <9Tv567jC1GsqY3tq4-_HqAxPzHbl9ckDsfIFN3bMvMw=.b73f9786-417d-4260-a569-28ad7ed504ab@github.com> On Fri, 14 May 2021 19:27:22 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 8085: >> >>> 8083: >>> 8084: instruct vmask_true_count_avx(rRegI dst, vec mask, rRegL tmp, vec xtmp, vec xtmp1) %{ >>> 8085: predicate(!VM_Version::supports_avx512bw()); >> >> `VM_Version::supports_avx512vlbw()` > > Handled in match_rule_supported_vector. I think you still need to adjust the predicate to be able to correctly split between AVX512BW+VL and AVX512F/AVX/AVX2 configurations. >> src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 469: >> >>> 467: public static >>> 468: >>> 469: int maskOp(int oper, Class maskClass, Class elemClass, int length, M m, >> >> I second Paul here: `maskOp` case is already covered by `reductionCoerced`. > > As discussed above mixing it with reduction coerced will require changes in original entry point (type parameter have Vector as the lower bound) , also we may need to bypass some irrelevant portions in inline_vector_reduction() , for the time being to keep the things clean added a different entry point for all masked operations. Ok, fair enough. We can revisit that later and merge them if needed. Some suggestions to consider to align it with `reductionCoerced`: * reflect in the name that it's effectively a reduction, but on masks (`maskReductionCoerced`?); * return type can be generalized to `long`; * bound on M: ``; * no need to introduce a special interface, `Function` just works: `VectorMaskOp` -> `Function`; ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From github.com+58006833+xbzhang99 at openjdk.java.net Fri May 14 21:47:32 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 14 May 2021 21:47:32 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v8] In-Reply-To: References: Message-ID: <62je7XSswP2cqpraFbnxN65flI6m5fGA0as9JVsYbAY=.df9d3002-98d3-4093-aa29-b7f37b1bbab5@github.com> > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > For the following benchmark: > http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java > > The optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: removed checkings for UseAdler32Intrinsics ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/3851c602..7290944b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=06-07 Stats: 5 lines in 3 files changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Fri May 14 22:57:00 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 14 May 2021 22:57:00 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v8] In-Reply-To: <62je7XSswP2cqpraFbnxN65flI6m5fGA0as9JVsYbAY=.df9d3002-98d3-4093-aa29-b7f37b1bbab5@github.com> References: <62je7XSswP2cqpraFbnxN65flI6m5fGA0as9JVsYbAY=.df9d3002-98d3-4093-aa29-b7f37b1bbab5@github.com> Message-ID: On Fri, 14 May 2021 21:47:32 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > removed checkings for UseAdler32Intrinsics @pfustc Hi Pengfei, I implemented Adler32 intrinsic for x86. Can I add your JMH micro http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java into openjdk test/micro/org/openjdk/bench/java/util/? ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From sviswanathan at openjdk.java.net Sat May 15 00:18:17 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Sat, 15 May 2021 00:18:17 GMT Subject: RFR: 8267190: Optimize Vector API test operations Message-ID: Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: 1) reinterpreting the floating point vectors as integral vectors (int/long) 2) perform the test in integer domain to get a int/long mask 3) reinterpret the int/long mask as float/double mask Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: Base: Benchmark (size) Mode Cnt Score Error Units VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms With patch: Benchmark (size) Mode Cnt Score Error Units VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms Best Regards, Sandhya ------------- Commit messages: - 8267190: Optimize Vector API test operations Changes: https://git.openjdk.java.net/jdk/pull/4039/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4039&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267190 Stats: 809 lines in 32 files changed: 714 ins; 0 del; 95 mod Patch: https://git.openjdk.java.net/jdk/pull/4039.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4039/head:pull/4039 PR: https://git.openjdk.java.net/jdk/pull/4039 From sviswanathan at openjdk.java.net Sat May 15 01:48:35 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Sat, 15 May 2021 01:48:35 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v3] In-Reply-To: References: Message-ID: <2gAn7RV3gzdAOuAZNRcINNLi4NKBwYJW-4NdjcRgpCE=.5e3c0cc3-c89b-486d-8749-cfa630e59f04@github.com> > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge master - Merge master - remove whitespace - Merge master - Small fix - cleanup - x86 short vector math optimization for Vector API ------------- Changes: https://git.openjdk.java.net/jdk/pull/3638/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=02 Stats: 417101 lines in 120 files changed: 416935 ins; 123 del; 43 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Sat May 15 02:06:29 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Sat, 15 May 2021 02:06:29 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Add missing Lib.gmk ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/6e105f51..01a549e4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=02-03 Stats: 42 lines in 1 file changed: 42 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From jbhateja at openjdk.java.net Sat May 15 02:30:40 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 15 May 2021 02:30:40 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v2] In-Reply-To: <9Tv567jC1GsqY3tq4-_HqAxPzHbl9ckDsfIFN3bMvMw=.b73f9786-417d-4260-a569-28ad7ed504ab@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> <9Tv567jC1GsqY3tq4-_HqAxPzHbl9ckDsfIFN3bMvMw=.b73f9786-417d-4260-a569-28ad7ed504ab@github.com> Message-ID: On Fri, 14 May 2021 21:23:31 GMT, Vladimir Ivanov wrote: > Ok, fair enough. We can revisit that later and merge them if needed. > Some suggestions to consider to align it with `reductionCoerced`: > > * reflect in the name that it's effectively a reduction, but on masks (`maskReductionCoerced`?); > * return type can be generalized to `long`; Hi @iwanowww, Can you kindly elaborate why should the return type be long here ? We will need to again downcast it to integer since these APIs return an integer value. > * bound on M: ``; > * no need to introduce a special interface, `Function` just works: `VectorMaskOp` -> `Function`; ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From aph at redhat.com Sat May 15 10:29:06 2021 From: aph at redhat.com (Andrew Haley) Date: Sat, 15 May 2021 11:29:06 +0100 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: References: Message-ID: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> On 4/22/21 11:27 PM, Sandhya Viswanathan wrote: > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. Is this really acceptable code quality for OpenJDK? No comments, no explanation of the derivation of algorithms, no explanation or proofs of accuracy. There doesn't even seem to be any source code, just compiler output. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From hshi at openjdk.java.net Sun May 16 10:37:04 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sun, 16 May 2021 10:37:04 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time In-Reply-To: References: Message-ID: On Wed, 5 May 2021 07:30:46 GMT, Hui Shi wrote: > Optimization for VerifyIterativeGVN, motiviation is running with -XX:+VerifyIterativeGVN is extremly slow. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 92s. > Detail data in JBS description. > > Optimizations includes: > 1. Optimize redundant verfications in PhaseIterGVN::verify_step. Nodes might verified multiple times. > Redundant verifications between full pass and _verify_window single node process. > Redundant verifications between different nodes in _verify_window > > 2. Optimize def-use edge checking: > Skip multiple checks for same x->n input edges. > Skip redundant check in inner loop when counting how many x in n's input edges, skip current index. > > 3. Optimize field access > Replace "n->in(j)" with "n->_in[j]", skipping unuseful assert when invoking Node::in(int index). > > Optimization#2/#3 decrease execution time and no other overhead. > optimization#1 adds 3 fields in class Node in debug build, they can be squeezed into an "int/long" if needed. > > jint _igvn_verify_depth_cur; > jint _igvn_verify_depth_prev; > julong _igvn_verify_epoch; close this. optimize VerifyIterativeGVN with better solution ------------- PR: https://git.openjdk.java.net/jdk/pull/3872 From hshi at openjdk.java.net Sun May 16 10:37:04 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sun, 16 May 2021 10:37:04 GMT Subject: Withdrawn: 8266528: Optimize C2 VerifyIterativeGVN execution time In-Reply-To: References: Message-ID: On Wed, 5 May 2021 07:30:46 GMT, Hui Shi wrote: > Optimization for VerifyIterativeGVN, motiviation is running with -XX:+VerifyIterativeGVN is extremly slow. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 92s. > Detail data in JBS description. > > Optimizations includes: > 1. Optimize redundant verfications in PhaseIterGVN::verify_step. Nodes might verified multiple times. > Redundant verifications between full pass and _verify_window single node process. > Redundant verifications between different nodes in _verify_window > > 2. Optimize def-use edge checking: > Skip multiple checks for same x->n input edges. > Skip redundant check in inner loop when counting how many x in n's input edges, skip current index. > > 3. Optimize field access > Replace "n->in(j)" with "n->_in[j]", skipping unuseful assert when invoking Node::in(int index). > > Optimization#2/#3 decrease execution time and no other overhead. > optimization#1 adds 3 fields in class Node in debug build, they can be squeezed into an "int/long" if needed. > > jint _igvn_verify_depth_cur; > jint _igvn_verify_depth_prev; > julong _igvn_verify_epoch; This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/3872 From yyang at openjdk.java.net Mon May 17 02:20:37 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 17 May 2021 02:20:37 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v4] In-Reply-To: References: Message-ID: > It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. Yi Yang has updated the pull request incrementally with one additional commit since the last revision: caonicalize in interpreter/c1/c2 modes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3616/files - new: https://git.openjdk.java.net/jdk/pull/3616/files/5a0716c8..3ba69f6a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3616&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3616&range=02-03 Stats: 23 lines in 1 file changed: 21 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3616.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3616/head:pull/3616 PR: https://git.openjdk.java.net/jdk/pull/3616 From yyang at openjdk.java.net Mon May 17 02:20:38 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 17 May 2021 02:20:38 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v3] In-Reply-To: References: Message-ID: On Fri, 14 May 2021 16:11:22 GMT, Vladimir Kozlov wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> rename; redundant reloading > > test/hotspot/jtreg/compiler/c1/CanonicalizeGetModifiers.java line 31: > >> 29: * @requires vm.compiler1.enabled >> 30: * @library /test/lib >> 31: * @run main/othervm -XX:TieredStopAtLevel=1 -Xbatch > > I would suggest to add 2 additional `@run` command to make sure test passed in all modes: > - default: without `-XX:TieredStopAtLevel` > - C2 only: with `-XX:-TieredCompilation` Thank you Vladimir for taking time to look at this. Good suggestion! I've added. ------------- PR: https://git.openjdk.java.net/jdk/pull/3616 From ngasson at openjdk.java.net Mon May 17 02:41:47 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 17 May 2021 02:41:47 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: On Fri, 14 May 2021 17:01:02 GMT, Andrew Haley wrote: >> That's fine, I'll leave that part as it. Knowing the motivation for this is useful, as every project is different. > > Sure, thanks. The commonality between this port and x86 means that in many cases we have taken x86 patches and applied them to this port, with a few tweaks. Of course the ports diverge over time, but even after almost ten years it still sometimes works. > > There's an urge from some contributors: when I suggest doing something in an easy-to-understand and clean way, people want to change everything else to match. This urge can be resisted, and IMVHO should be in this case. Churn is, in itself, bad. Consistency is at least somewhat important though, right? This code is read much more often than it's modified, and in the case of the platform ports, often by people with less experience of OpenJDK as a whole. It seems worth spending a little time to do cleanups when modifying adjacent code, if it makes it easier to understand. > > And this case, is special, I think, because `does_not_return` uses the "don't do" anti-pattern, where the `true` case was `does_not_return`. `dont_gc_arguments` is equally or more confusing because its value is false and then it gets passed to an argument `must_gc_arguments` whose sense is inverted. I don't see what's wrong with: ```c++ StubFrame f(sasm, "blah", /* must_gc_arguments */ false, /* does_not_return */ true); ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From xgong at openjdk.java.net Mon May 17 03:21:54 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 17 May 2021 03:21:54 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: References: Message-ID: On Fri, 14 May 2021 15:42:53 GMT, Vladimir Ivanov wrote: > The fix makes perfect sense, but I'm curious why do we have `VectorLoadConst` in the first place. > > It exposes JVM support for the iota vector constant materialization, but it's not clear to me what benefits it brings compared to feeding the intrinsic with the vector materialized on JDK side. Thanks for looking at this PR @iwanowww ! As far as I know the `VectorLoadConst` is used here to get the initial shuffle iota of the vector. I'm not so clear about what the `iota vector constant materialization` you mean. Could you please elaborate more about it? Thanks so much! ------------- PR: https://git.openjdk.java.net/jdk/pull/4023 From pli at openjdk.java.net Mon May 17 03:54:47 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Mon, 17 May 2021 03:54:47 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v8] In-Reply-To: References: <62je7XSswP2cqpraFbnxN65flI6m5fGA0as9JVsYbAY=.df9d3002-98d3-4093-aa29-b7f37b1bbab5@github.com> Message-ID: <7heFf4AJDvsDLlBqyQ-c7PiH2R0tbxp2eqYg0iz1GX4=.2b64476f-60e6-40b5-bd41-a2aeb121f9e3@github.com> On Fri, 14 May 2021 22:53:30 GMT, Xubo Zhang wrote: > @pfustc Hi Pengfei, > I implemented Adler32 intrinsic for x86. Can I add your JMH micro http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java into openjdk test/micro/org/openjdk/bench/java/util/? Sorry I missed your previous mention. That jmh case was written by me. I didn't add any copyright header in the source file. So I think anyone can use it or contribute it to the jdk source for free. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From hshi at openjdk.java.net Mon May 17 05:31:32 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Mon, 17 May 2021 05:31:32 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time Message-ID: Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not met nodes whose _idx is not unique (PhaseIterGVN::optimize will not rigger in the middle of PhaseRenumberLive ). Adding an assertion in Node::verify, every node is in current node_arena() passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. ------------- Commit messages: - 8266528: Optimize C2 VerifyIterativeGVN execution time Changes: https://git.openjdk.java.net/jdk/pull/4045/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4045&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266528 Stats: 44 lines in 4 files changed: 20 ins; 10 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/4045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4045/head:pull/4045 PR: https://git.openjdk.java.net/jdk/pull/4045 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 17 05:47:39 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 17 May 2021 05:47:39 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v9] In-Reply-To: References: Message-ID: > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > For the following benchmark: > http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java > > The optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: Add jmh test for Adler32 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/7290944b..c8e2ab05 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=07-08 Stats: 62 lines in 1 file changed: 62 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From david.holmes at oracle.com Mon May 17 06:18:10 2021 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 May 2021 16:18:10 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: References: Message-ID: Hi Vladimir, Thanks for the review! On 15/05/2021 2:32 am, Vladimir Ivanov wrote: > On Wed, 12 May 2021 05:33:14 GMT, David Holmes wrote: > >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > Overall, it looks very good. > Thanks for taking care of compiler part, David. > > I think it makes sense to remove lir_div_strictfp and lir_mul_strictfp in C1 as well: > https://github.com/openjdk/jdk/pull/4027 > > Feel free to incorporate the patch into the current PR if you agree with the change. > (Passed hs-tier1 - hs-tier4 testing and x86_32 build.) > > Otherwise, I'll handle it as a separate PR. That is the kind of change I was unsure should be made - if the semantics are always strict does it make sense to keep the version with strict in their name instead? But if the compiler team prefer to get rid of them I'm happy to pull in your patch. Thanks! David > ------------- > > Marked as reviewed by vlivanov (Reviewer). > > PR: https://git.openjdk.java.net/jdk/pull/3991 > From ddong at openjdk.java.net Mon May 17 07:20:18 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 17 May 2021 07:20:18 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: fix crash problem ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/73c1cc38..9fd0550b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=04-05 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Mon May 17 07:20:20 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 17 May 2021 07:20:20 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v5] In-Reply-To: <5ax-PzsNOkbq_9_TZZcksCmyDyyAk9YTNS_8qlQQq7Y=.241ab2e2-5580-405b-b93a-71f44f517a50@github.com> References: <5ax-PzsNOkbq_9_TZZcksCmyDyyAk9YTNS_8qlQQq7Y=.241ab2e2-5580-405b-b93a-71f44f517a50@github.com> Message-ID: On Wed, 12 May 2021 07:26:19 GMT, Tobias Hartmann wrote: > Hi Denghui, > > I've attached the corresponding hs_err and replay files to the bug. Hope that helps! > > Tobias Hi Tobias, Thank you. Although I still cannot reproduce the crash problem, after learning the implementation of escape analysis, I think it's harmless to add "trace_id_load_barrier" to the expected CallLeaf list, please feel free to correct me if I'm wrong. Denghui ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From roland at openjdk.java.net Mon May 17 08:12:46 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 17 May 2021 08:12:46 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v3] In-Reply-To: References: Message-ID: > Sinking data nodes out of a loop when all uses are out of a loop has > several issues that this attempts to fix. > > 1- Only non control uses are considered which makes little sense (why > not sink if the data node is an argument to a call or a returned > value?) > > 2- Sinking of Loads is broken because of the handling of > anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control > in the loop because it takes all uses into account. > > 3- For data nodes for which a control edge can't be set, commoning of > clones back in the loop is prevented with: > _igvn._worklist.yank(x); > which gives no guarantee > > This patch tries to address all issues: > > 1- it looks at all uses, not only non control uses > > 2- anti-dependences are computed for each use independently > > 3- Cast nodes are used to pin clones out of loop > > > 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl() > logic. While working on this, I noticed a bug in anti-dependence > analysis: when the use is a cfg node, the code sometimes looks at uses > of the memory state of the cfg. The logic uses the use of the cfg > which is a projection of adr_type identical to the cfg. It should > instead look at the use of the memory projection. > > The existing logic for sinking loads calls clear_dom_lca_tags() for > every load which seems like quite a waste. I added a > _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By > incrementing _dom_lca_tags_round, new tags that don't conflict with > existing ones are produced and there's no need for > clear_dom_lca_tags(). > > For anti-dependence analysis to return a correct result, early control > of the load is needed. The only way to get it at this stage, AFAICT, > is to compute it by following the load's input until a pinned node is > reached. > > The existing logic pins cloned nodes next to their use. The logic I > propose pins them right out of the loop. This could possibly avoid > some redundant clones. It also makes some special handling for corner > cases with loop strip mining useless. > > For 3-, I added extra Cast nodes for float types. If a chain of data > nodes are sunk, the new logic tries to keep a single Cast for the > entire chain rather than one Cast per node. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Tobias' review - Merge branch 'master' into JDK-8252372 - CastVV - Merge branch 'master' into JDK-8252372 - extra comments - fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3689/files - new: https://git.openjdk.java.net/jdk/pull/3689/files/1d90bb1a..19a01e93 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3689&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3689&range=01-02 Stats: 532053 lines in 4759 files changed: 32092 ins; 487232 del; 12729 mod Patch: https://git.openjdk.java.net/jdk/pull/3689.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3689/head:pull/3689 PR: https://git.openjdk.java.net/jdk/pull/3689 From roland at openjdk.java.net Mon May 17 08:12:46 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 17 May 2021 08:12:46 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v2] In-Reply-To: References: Message-ID: On Tue, 11 May 2021 09:07:53 GMT, Tobias Hartmann wrote: > This is hard to review but looks reasonable to me. Performance and correctness testing also looks good. Thanks for reviewing it. > src/hotspot/share/opto/loopopts.cpp line 1137: > >> 1135: //------------------------------place_near_use--------------------------------- >> 1136: // Place some computation next to use but not inside inner loops. >> 1137: Node* PhaseIdealLoop::place_near_use(Node* useblock, IdealLoopTree* loop) const { > > Maybe the name and comment should be adjusted since we no longer place it next to the use but right outside of the loop. Right. Updated. ------------- PR: https://git.openjdk.java.net/jdk/pull/3689 From jbhateja at openjdk.java.net Mon May 17 08:39:22 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 17 May 2021 08:39:22 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v4] In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8256973: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3916/files - new: https://git.openjdk.java.net/jdk/pull/3916/files/691d082c..95811bc3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=02-03 Stats: 862 lines in 67 files changed: 593 ins; 51 del; 218 mod Patch: https://git.openjdk.java.net/jdk/pull/3916.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3916/head:pull/3916 PR: https://git.openjdk.java.net/jdk/pull/3916 From jbhateja at openjdk.java.net Mon May 17 08:47:49 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 17 May 2021 08:47:49 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v4] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Mon, 17 May 2021 08:39:22 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Review comments resolution. Hi @iwanowww , your comments have been addressed. >I think you still need to adjust the predicate to be able to correctly split between AVX512BW+VL and AVX512F/AVX/AVX2 >configurations. There are two patterns now one which supports AVX512VLBW (to handle mask length from 2-64) and other non-AVX512LVBW ( to handle mask lengths 2-32) , Byte512Vector mandates the presence of AVX512BW as enforced by Matcher::match_rule_supported_vector()) thus removed the special code sequence for 512 bit vector in absence of AVX512BW feature. > reflect in the name that it's effectively a reduction, but on masks (maskReductionCoerced?); DONE > bound on M: ; DONE ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From thartmann at openjdk.java.net Mon May 17 10:08:05 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 17 May 2021 10:08:05 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 07:20:18 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix crash problem Hi Denghui, yes, that makes sense. Best regards, Tobias ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From dongbo at openjdk.java.net Mon May 17 10:53:05 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 17 May 2021 10:53:05 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: On Mon, 26 Apr 2021 11:16:00 GMT, Dong Bo wrote: >> On aarch64, current implementations of vector reduce_add2I, reduce_max2I, reduce_min2I can be optimized with NEON pairwise instructions: >> >> >> ## reduce_add2I, before >> mov w10, v19.s[0] >> mov w2, v19.s[1] >> add w10, w0, w10 >> add w10, w10, w2 >> ## reduce_add2I, optimized >> addp v23.2s, v24.2s, v24.2s >> mov w10, v23.s[0] >> add w10, w10, w2 >> >> ## reduce_max2I, before >> dup v16.2d, v23.d[0] >> sminv s16, v16.4s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> ## reduce_max2I, optimized >> sminp v16.2s, v23.2s, v23.2s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> >> >> I don't expect this to change anything of SuperWord, vectorizing reductions of two integers is disabled by [1]. >> This is useful for VectorAPI, tested benchmarks in [2], performance can improve ~51% and ~8% for `Int64Vector.ADD` and `Int64Vector.MAX` respectively. >> >> >> Benchmark (size) Mode Cnt Score Error Units >> # optimized >> Int64Vector.ADDLanes 1024 thrpt 10 2492.123 ? 23.561 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1825.882 ? 5.261 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1921.028 ? 3.253 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1588.575 ? 3.903 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1923.913 ? 2.117 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1596.875 ? 2.163 ops/ms >> # default >> Int64Vector.ADDLanes 1024 thrpt 10 1644.223 ? 1.885 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1491.502 ? 26.436 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1784.066 ? 3.816 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1494.750 ? 3.451 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1785.266 ? 8.893 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1499.233 ? 3.498 ops/ms >> >> >> Verified correctness with tests `test/jdk/jdk/incubator/vector/`. Also tested linux-aarch64-server-fastdebug tier1-3. >> >> [1] https://github.com/openjdk/jdk/blob/3bf4c904fbbd87d4db18db22c1be384616483eed/src/hotspot/share/opto/superword.cpp#L2004 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/Int64Vector.java > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add assembler tests for smaxp/sminp > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at mail.openjdk.java.net):_ > > On 5/10/21 6:55 AM, Dong Bo wrote: > > > PING? Any comments/suggestions are appreciated. > > Although this has been reviewed by Ningsheng, we still need help from reviewers here. > > I'm testing this now. PING? Any suggestions? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From cgo at openjdk.java.net Mon May 17 10:57:10 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Mon, 17 May 2021 10:57:10 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 Message-ID: It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 # assert(false) failed: bad AD file Testing: hotspot tier1 on ARMv7-A / linux ------------- Commit messages: - Fixes convL2FSupported for aarch32 Changes: https://git.openjdk.java.net/jdk/pull/4053/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4053&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267237 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4053.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4053/head:pull/4053 PR: https://git.openjdk.java.net/jdk/pull/4053 From redestad at openjdk.java.net Mon May 17 11:07:44 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 17 May 2021 11:07:44 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux Looks good and trivial. ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4053 From jbhateja at openjdk.java.net Mon May 17 11:25:34 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 17 May 2021 11:25:34 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v3] In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) > -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? > ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 > ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 > ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 > ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 > ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 > ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 > ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 > ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 > ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 > ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 > ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 > ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 > ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 > ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 > ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 > ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 > ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 > ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 > ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 > ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 > ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 > ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 > ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 > ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 > ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 > ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 > ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 > ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 > ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 > ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 > ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 > ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 > ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 > ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 > ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 > ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 > ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 > ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 > ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 > ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 > ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 > ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 > ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 > ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 > ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 > ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 > ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 > ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 > ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 > ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 > ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 > ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 > ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 > ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 > ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 > ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 > ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 > ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 > ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 > ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 > ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 > ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 > ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 > ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 > ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 > ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 > ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 > ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 > ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 > ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 > ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 > ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 > ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 > ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 > ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 > ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 > ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 > ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 > ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 > ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 > ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 > ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 > ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 > ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 > ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 > ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 > ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 > ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 > ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 > ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 > ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 > ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 > ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 > ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 > ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 > ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 > ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 > ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 > ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 > ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 > ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 > ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 > ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 > ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 > ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 > ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 > ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 > ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 > ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 > ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 > ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 > ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 > ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 > ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 > ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 > ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 > ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 > ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 > ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 > ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 > ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 > ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 > ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 > ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 > ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 > ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 > ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 > ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 > ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 > ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 > ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 > ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 > ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 > ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 > ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 > ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 > ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 > ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 > ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 > ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 > ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 > ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 > ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 > ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 > ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 > ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 > ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 > ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 > ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 > ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 > ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 > ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 > ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 > ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 > ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 > ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 > ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 > ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 > ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 > ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 > ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 > ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 > ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 > ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 > ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 > ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 > ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 > ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 > ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 > ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 > ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 > ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 > ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 > ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 > ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266951: Enable partial in-lining if UsePartialInlineSize=64, adding a benchmark for small sized conversions of various primitive types. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3999/files - new: https://git.openjdk.java.net/jdk/pull/3999/files/851662e4..1070ab55 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=01-02 Stats: 144 lines in 2 files changed: 142 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Mon May 17 11:44:38 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 17 May 2021 11:44:38 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Fri, 14 May 2021 19:23:40 GMT, Paul Sandoz wrote: >> Hi @PaulSandoz , after removal of java side changes, I still see good gains for small sizes but there is considerable penalty. >> Will set the threshold to 0, and re-compute the numbers, seek your inputs on adding target specific THRESHOLD. Could not locate any direct public java API or internal jdk API which could be used to fetch target information. > > @jatin-bhateja glad the variation is small. > If the subsequent results without and with a zero threshold for lengths below the current threshold show increased benefits i am sure we can find a way to surface up some detail. Hi @PaulSandoz I compared the performance of partial in-lining changes with THRESHOLD (java side threshold in ArraySupports.mismatch*) set to ZERO vs existing values, motivation for setting THRESHOLD value to ZERO was to compare the performance of Java side scalar compare loop to newly introduced inline sequence. Important point here is that both the scalar tail loop and in-lined sequence are JITed and bypasses the heavy vectorizedMismatch stub call. Existing Java side threshold in ArraySupport.mismatch* routines below which scalar tail handles the comparison: Byte : 7 Char/Short : 3 Integer/Float : 1 Long/Double : 0 Observations: 1) Scalar loop with existing threshold performs better compared to inline sequence for all the primitive types except byte. 2) For byte performance of scalar loop is better compared to new in-lined sequence for comparison length <= 3. For length > 3 and <= 7 new in-lined sequence give good performance. For all the other cases listed below which were calling vectorizedMismatch stub up till now, partial in-lining shows significant gains. ``` (UsePartialInlineSize = 32) (UsePartialInlineSize=64) (elem cnt/bytes) AVX3 - YMM register size = 32 bytes AVX3 - ZMM register size = 64 bytes Byte = 7 (7 bytes) 25 (25 bytes) 57 (57 bytes) Short = 3 (6 bytes) 13 (26 bytes) 29 (58 bytes) Int/Float = 1 (4 bytes) 7 (28 bytes) 15 (60 bytes) Long = 0 (0 bytes) 4 (32 bytes) 8 ( 64 bytes) "Thus the only scope of tuning in existing thresholds is for byte primitive type when compare length is > 3 and <= 7, following is performance variation for these lengths. Benchmark | SIZE | BaseLine (ops/ms) | PI32 ops/ms (Threshold=0) | PI64 ops/ms (Threshold=0) -- | -- | -- | -- | -- ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 175700.411 | 167672.548 ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 187887.81 | 178366.916 ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 172835.708 | 154118.205 ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 173526.975 | 151229.364 ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 171715.691 | 127025.152 ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 179272.779 | 161146.445 In general it looks like we can keep the existing thresholds for the time being and take the advantage of partial in-lining for other cases where comparison can fit within one vector. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From hshi at openjdk.java.net Mon May 17 11:52:05 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Mon, 17 May 2021 11:52:05 GMT Subject: RFR: 8267212: test/jdk/java/util/Collections/FindSubList.java debug build intermittent crash with "no reachable node should have no use" Message-ID: ? crash with "no reachable node should have no use" Please help reivew this fix. StrIntrinsicNode::Ideal uses Node::set_req to replace memory input, old memory input might have 0 use, but not added into PhaseGVN worklist. Using set_req_X to ensure add 0 out old memory input node into PhaseGVN worklist. Find other two similar problemtic code in LoadNode::Ideal. Tier1/2/3 pass with release/fastdebug build. test/jdk/java/util/Collections/FindSubList.java doesn't fail in 100 runs (before fix 2/3 failure in 10 runs). ------------- Commit messages: - 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use" Changes: https://git.openjdk.java.net/jdk/pull/4055/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4055&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267212 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/4055.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4055/head:pull/4055 PR: https://git.openjdk.java.net/jdk/pull/4055 From jbhateja at openjdk.java.net Mon May 17 12:06:33 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 17 May 2021 12:06:33 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v6] In-Reply-To: References: Message-ID: > Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- > > vec1 = lanewise(VectorOperators.LSHL, n) > vec2 = lanewise(VectorOperators.LSHR, n) > res = lanewise(VectorOperations.OR, vec1 , vec2) > > This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. > > AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. > > Please find below the performance data for included JMH benchmark. > Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > Benchmark | (TESTSIZE) | Shift | Baseline AVX3 (ops/ms) | Withopt? AVX3 (ops/ms) | Gain % | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain % > -- | -- | -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? | ? | ? > RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 17223.35 | 17094.69 | -0.75 | 17008.32 | 17488.06 | 2.82 > RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 8944.98 | 8811.34 | -1.49 | 8878.17 | 9218.68 | 3.84 > RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 17195.75 | 17137.32 | -0.34 | 16789.01 | 17780.34 | 5.90 > RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 9052.67 | 8838.60 | -2.36 | 8814.62 | 9206.01 | 4.44 > RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 17100.19 | 16950.64 | -0.87 | 16827.73 | 17720.37 | 5.30 > RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 9079.95 | 8471.26 | -6.70 | 8888.44 | 9167.68 | 3.14 > RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 21231.33 | 21513.08 | 1.33 | 21824.51 | 21479.48 | -1.58 > RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 11103.62 | 11180.16 | 0.69 | 11173.67 | 11529.22 | 3.18 > RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 21119.14 | 21552.04 | 2.05 | 21693.05 | 21915.37 | 1.02 > RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 11048.68 | 11094.20 | 0.41 | 11049.90 | 11439.07 | 3.52 > RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 21506.31 | 21391.41 | -0.53 | 21263.18 | 21986.29 | 3.40 > RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 11056.12 | 11232.78 | 1.60 | 10941.59 | 11397.09 | 4.16 > RotateBenchmark.testRotateLeftB | 512.00 | 7.00 | 17976.56 | 18180.85 | 1.14 | 1212.26 | 2533.34 | 108.98 > RotateBenchmark.testRotateLeftB | 512.00 | 15.00 | 17553.70 | 18219.07 | 3.79 | 1256.73 | 2537.41 | 101.91 > RotateBenchmark.testRotateLeftB | 512.00 | 31.00 | 17618.03 | 17738.15 | 0.68 | 1214.69 | 2533.83 | 108.60 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 7258.87 | 7468.88 | 2.89 | 7115.12 | 7117.26 | 0.03 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 3586.65 | 3950.85 | 10.15 | 3532.17 | 3595.80 | 1.80 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 1835.07 | 1999.68 | 8.97 | 1789.90 | 1819.93 | 1.68 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 7273.36 | 7410.91 | 1.89 | 7198.60 | 6994.79 | -2.83 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 3674.98 | 3926.27 | 6.84 | 3549.90 | 3755.09 | 5.78 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 1840.94 | 1882.25 | 2.24 | 1801.56 | 1872.89 | 3.96 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 7457.11 | 7361.48 | -1.28 | 6975.33 | 7385.94 | 5.89 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 3570.74 | 3929.30 | 10.04 | 3635.37 | 3736.67 | 2.79 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 1902.32 | 1960.46 | 3.06 | 1812.32 | 1813.88 | 0.09 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 11174.24 | 12044.52 | 7.79 | 11509.87 | 11273.44 | -2.05 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 5981.47 | 6073.70 | 1.54 | 5593.66 | 5661.93 | 1.22 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 2932.49 | 3069.54 | 4.67 | 2950.86 | 2892.42 | -1.98 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 11764.11 | 12098.63 | 2.84 | 11069.52 | 11476.93 | 3.68 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 5855.20 | 6080.40 | 3.85 | 5919.11 | 5607.04 | -5.27 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 2989.05 | 3048.56 | 1.99 | 2902.63 | 2821.83 | -2.78 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 11652.84 | 11965.40 | 2.68 | 11525.62 | 11459.83 | -0.57 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 5851.82 | 6164.94 | 5.35 | 5882.60 | 5842.30 | -0.69 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 3015.99 | 3043.79 | 0.92 | 2963.71 | 2947.97 | -0.53 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 16029.15 | 16189.79 | 1.00 | 860.43 | 2339.32 | 171.88 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 8078.25 | 8081.84 | 0.04 | 427.39 | 1147.92 | 168.59 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 4021.49 | 4294.03 | 6.78 | 209.25 | 582.28 | 178.27 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 15912.98 | 16329.03 | 2.61 | 848.23 | 2296.78 | 170.77 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 8054.10 | 8306.37 | 3.13 | 429.93 | 1146.90 | 166.77 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 4102.58 | 4071.08 | -0.77 | 217.86 | 582.20 | 167.24 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 16177.79 | 16287.85 | 0.68 | 857.84 | 2243.15 | 161.49 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 8187.47 | 8410.48 | 2.72 | 434.60 | 1128.20 | 159.60 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 4109.15 | 4233.80 | 3.03 | 208.71 | 572.43 | 174.27 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 3755.09 | 3930.29 | 4.67 | 3604.19 | 3598.47 | -0.16 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 1829.03 | 1957.39 | 7.02 | 1833.95 | 1808.38 | -1.39 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 915.35 | 970.55 | 6.03 | 916.25 | 899.08 | -1.87 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 3664.85 | 3812.26 | 4.02 | 3629.37 | 3579.23 | -1.38 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 1829.51 | 1877.76 | 2.64 | 1781.05 | 1807.57 | 1.49 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 913.37 | 953.42 | 4.38 | 912.26 | 908.73 | -0.39 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 3648.45 | 3899.20 | 6.87 | 3552.67 | 3581.04 | 0.80 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 1816.50 | 1959.68 | 7.88 | 1820.88 | 1819.71 | -0.06 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 901.05 | 955.13 | 6.00 | 913.74 | 907.90 | -0.64 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 5850.99 | 6108.64 | 4.40 | 5882.65 | 5755.21 | -2.17 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 2962.21 | 3060.47 | 3.32 | 2955.20 | 2909.18 | -1.56 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 1480.46 | 1534.72 | 3.66 | 1467.78 | 1430.60 | -2.53 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 5858.23 | 6047.51 | 3.23 | 5770.02 | 5773.19 | 0.05 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 2951.49 | 3096.53 | 4.91 | 2885.21 | 2899.31 | 0.49 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 1486.26 | 1527.94 | 2.80 | 1441.93 | 1454.25 | 0.85 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 5873.21 | 6089.75 | 3.69 | 5767.58 | 5664.11 | -1.79 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 2969.67 | 3081.39 | 3.76 | 2878.50 | 2905.86 | 0.95 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 1452.21 | 1520.03 | 4.67 | 1430.30 | 1485.63 | 3.87 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 8088.65 | 8443.63 | 4.39 | 455.67 | 1226.33 | 169.13 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 4011.95 | 4120.25 | 2.70 | 229.77 | 619.87 | 169.77 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 2090.57 | 2109.53 | 0.91 | 115.21 | 310.36 | 169.37 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 8166.84 | 8557.28 | 4.78 | 457.67 | 1242.86 | 171.56 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 4137.02 | 4287.95 | 3.65 | 227.26 | 624.80 | 174.93 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 2095.01 | 2102.86 | 0.37 | 114.26 | 310.83 | 172.03 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 8082.68 | 8400.56 | 3.93 | 459.59 | 1230.07 | 167.64 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 4047.67 | 4147.58 | 2.47 | 229.01 | 606.38 | 164.78 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 2086.83 | 2126.72 | 1.91 | 111.93 | 305.66 | 173.08 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 13597.19 | 13255.09 | -2.52 | 13818.39 | 13242.40 | -4.17 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 7028.26 | 6826.59 | -2.87 | 6765.15 | 6907.87 | 2.11 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 3570.40 | 3468.01 | -2.87 | 3449.66 | 3533.50 | 2.43 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 13615.99 | 13464.40 | -1.11 | 13330.02 | 13870.57 | 4.06 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 7043.31 | 6763.34 | -3.97 | 6928.88 | 7063.57 | 1.94 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 3495.12 | 3537.62 | 1.22 | 3503.41 | 3457.67 | -1.31 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 13591.66 | 13665.84 | 0.55 | 13773.27 | 13126.08 | -4.70 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 7027.08 | 7011.24 | -0.23 | 6974.98 | 6815.50 | -2.29 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 3568.28 | 3569.62 | 0.04 | 3580.67 | 3463.58 | -3.27 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 21154.03 | 21416.32 | 1.24 | 21187.01 | 21401.61 | 1.01 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 11194.24 | 10865.47 | -2.94 | 11063.19 | 10977.60 | -0.77 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 5797.80 | 5523.94 | -4.72 | 5654.63 | 5468.78 | -3.29 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 21333.89 | 21412.74 | 0.37 | 21610.94 | 20908.96 | -3.25 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 11327.07 | 11113.48 | -1.89 | 11148.25 | 10678.14 | -4.22 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 5810.69 | 5569.72 | -4.15 | 5663.26 | 5618.87 | -0.78 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 21753.20 | 21198.43 | -2.55 | 21567.90 | 21929.81 | 1.68 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 11517.08 | 11039.64 | -4.15 | 11103.08 | 10871.59 | -2.08 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 5897.16 | 5606.75 | -4.92 | 5459.87 | 5604.12 | 2.64 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 29748.53 | 28883.73 | -2.91 | 1549.02 | 3928.53 | 153.61 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 15197.09 | 15878.19 | 4.48 | 772.59 | 1924.35 | 149.08 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 8046.30 | 8081.19 | 0.43 | 388.11 | 990.28 | 155.16 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 30618.04 | 29419.19 | -3.92 | 1524.22 | 3915.97 | 156.92 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 15854.43 | 15846.37 | -0.05 | 766.09 | 1953.60 | 155.01 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 7814.77 | 7899.30 | 1.08 | 390.82 | 970.37 | 148.29 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 29596.82 | 28538.69 | -3.58 | 1530.45 | 3906.91 | 155.28 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 15662.48 | 15849.25 | 1.19 | 778.08 | 1934.31 | 148.60 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 8121.14 | 7758.59 | -4.46 | 392.78 | 959.73 | 144.34 > RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 17465.84 | 17069.34 | -2.27 | 16849.73 | 17842.08 | 5.89 > RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 9049.19 | 8864.15 | -2.04 | 8786.67 | 9105.34 | 3.63 > RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 17703.38 | 17070.98 | -3.57 | 16595.85 | 17784.68 | 7.16 > RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 9007.68 | 8817.41 | -2.11 | 8704.49 | 9185.87 | 5.53 > RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 17531.05 | 16983.40 | -3.12 | 16947.69 | 17655.40 | 4.18 > RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 8986.30 | 8794.15 | -2.14 | 8816.62 | 9225.95 | 4.64 > RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 21293.95 | 21506.74 | 1.00 | 21163.29 | 21854.03 | 3.26 > RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 11258.47 | 11072.92 | -1.65 | 11118.12 | 11338.96 | 1.99 > RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 21253.36 | 21292.37 | 0.18 | 21224.39 | 21763.88 | 2.54 > RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 11064.80 | 11198.35 | 1.21 | 10960.98 | 11294.14 | 3.04 > RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 21358.14 | 21346.21 | -0.06 | 21487.25 | 21854.42 | 1.71 > RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 11045.61 | 11208.26 | 1.47 | 10907.03 | 11415.18 | 4.66 > RotateBenchmark.testRotateRightB | 512.00 | 7.00 | 17898.61 | 18307.54 | 2.28 | 1214.65 | 2546.64 | 109.66 > RotateBenchmark.testRotateRightB | 512.00 | 15.00 | 17909.25 | 18242.51 | 1.86 | 1215.05 | 2563.98 | 111.02 > RotateBenchmark.testRotateRightB | 512.00 | 31.00 | 17883.35 | 17928.44 | 0.25 | 1220.77 | 2543.30 | 108.34 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 7139.97 | 7626.72 | 6.82 | 6994.86 | 7075.65 | 1.15 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 3657.37 | 3898.34 | 6.59 | 3617.06 | 3576.12 | -1.13 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 1804.26 | 1969.19 | 9.14 | 1796.62 | 1858.84 | 3.46 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 7404.31 | 7760.09 | 4.80 | 7036.77 | 7401.52 | 5.18 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 3600.52 | 3956.35 | 9.88 | 3595.28 | 3560.36 | -0.97 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 1813.32 | 1966.41 | 8.44 | 1839.95 | 1852.53 | 0.68 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 7118.48 | 7724.81 | 8.52 | 7151.56 | 7021.09 | -1.82 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 3529.70 | 3881.63 | 9.97 | 3623.08 | 3601.01 | -0.61 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 1823.61 | 1961.34 | 7.55 | 1786.86 | 1748.85 | -2.13 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 11697.98 | 11835.25 | 1.17 | 11513.16 | 11184.87 | -2.85 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 5890.11 | 6102.57 | 3.61 | 5658.79 | 5696.08 | 0.66 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 2964.94 | 3070.26 | 3.55 | 2945.00 | 2962.08 | 0.58 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 11562.51 | 12151.29 | 5.09 | 11404.17 | 11120.28 | -2.49 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 5702.93 | 6130.57 | 7.50 | 5799.54 | 5779.08 | -0.35 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 2861.96 | 3051.44 | 6.62 | 2943.99 | 2860.65 | -2.83 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 11203.13 | 11710.59 | 4.53 | 11363.18 | 11112.16 | -2.21 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 5893.97 | 6070.71 | 3.00 | 5776.67 | 5648.84 | -2.21 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 2971.83 | 3046.76 | 2.52 | 2903.35 | 2833.88 | -2.39 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 16064.71 | 15851.35 | -1.33 | 861.93 | 2256.88 | 161.84 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 7916.80 | 8462.65 | 6.89 | 430.23 | 1147.30 | 166.67 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 4104.64 | 4068.28 | -0.89 | 216.30 | 572.86 | 164.84 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 16133.09 | 16281.59 | 0.92 | 856.36 | 2229.58 | 160.35 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 8127.26 | 8117.59 | -0.12 | 419.16 | 1176.42 | 180.66 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 4080.11 | 4063.26 | -0.41 | 218.32 | 571.93 | 161.97 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 15834.26 | 16314.64 | 3.03 | 865.96 | 2297.74 | 165.34 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 7965.62 | 8270.48 | 3.83 | 428.55 | 1148.87 | 168.08 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 4161.69 | 4034.76 | -3.05 | 215.63 | 570.19 | 164.43 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 3556.70 | 3877.08 | 9.01 | 3596.46 | 3558.32 | -1.06 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 1772.93 | 1993.86 | 12.46 | 1856.79 | 1783.22 | -3.96 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 908.66 | 1000.37 | 10.09 | 944.79 | 922.91 | -2.32 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 3742.44 | 3748.41 | 0.16 | 3788.07 | 3570.67 | -5.74 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 1817.53 | 1985.69 | 9.25 | 1892.38 | 1833.16 | -3.13 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 941.03 | 952.68 | 1.24 | 915.79 | 910.21 | -0.61 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 3649.48 | 3896.56 | 6.77 | 3637.59 | 3557.53 | -2.20 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 1840.12 | 1997.19 | 8.54 | 1821.47 | 1799.82 | -1.19 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 901.33 | 995.67 | 10.47 | 909.20 | 902.73 | -0.71 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 5789.93 | 5960.54 | 2.95 | 5758.14 | 5736.30 | -0.38 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 2963.20 | 3063.30 | 3.38 | 2943.48 | 2833.84 | -3.72 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 1501.81 | 1510.23 | 0.56 | 1463.85 | 1462.26 | -0.11 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 5870.05 | 5951.43 | 1.39 | 5794.74 | 5604.58 | -3.28 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 2971.36 | 3047.00 | 2.55 | 2931.19 | 2907.30 | -0.82 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 1473.97 | 1530.54 | 3.84 | 1473.45 | 1442.40 | -2.11 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 5858.08 | 6080.49 | 3.80 | 5863.69 | 5549.85 | -5.35 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 2916.24 | 3045.77 | 4.44 | 2981.59 | 2815.07 | -5.58 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 1441.20 | 1531.56 | 6.27 | 1492.47 | 1473.25 | -1.29 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 8147.24 | 8310.05 | 2.00 | 469.45 | 1235.21 | 163.12 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 4142.95 | 4258.86 | 2.80 | 234.14 | 615.52 | 162.88 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 2095.48 | 2087.20 | -0.40 | 113.55 | 311.19 | 174.05 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 8222.94 | 8246.58 | 0.29 | 458.91 | 1244.32 | 171.15 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 4160.04 | 4226.46 | 1.60 | 227.78 | 625.38 | 174.56 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 2064.63 | 2162.44 | 4.74 | 113.27 | 314.15 | 177.36 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 8157.94 | 8466.90 | 3.79 | 450.26 | 1221.90 | 171.37 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 4039.74 | 4283.33 | 6.03 | 224.82 | 612.68 | 172.53 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 2066.88 | 2147.51 | 3.90 | 110.97 | 303.43 | 173.42 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 13548.39 | 13245.87 | -2.23 | 13490.93 | 13084.76 | -3.01 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 7020.16 | 6768.85 | -3.58 | 6991.39 | 7044.32 | 0.76 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 3550.50 | 3505.19 | -1.28 | 3507.12 | 3612.86 | 3.01 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 13743.43 | 13325.44 | -3.04 | 13696.15 | 13255.80 | -3.22 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 6856.02 | 6969.18 | 1.65 | 6886.29 | 6834.12 | -0.76 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 3569.53 | 3492.76 | -2.15 | 3539.02 | 3470.02 | -1.95 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 13704.18 | 13495.07 | -1.53 | 13649.14 | 13583.87 | -0.48 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 7011.77 | 6953.93 | -0.82 | 6978.28 | 6740.30 | -3.41 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 3591.62 | 3620.12 | 0.79 | 3502.04 | 3510.05 | 0.23 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 21950.71 | 22113.60 | 0.74 | 21484.27 | 21596.64 | 0.52 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 11616.88 | 11099.73 | -4.45 | 11188.29 | 10737.68 | -4.03 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 5872.72 | 5579.12 | -5.00 | 5784.05 | 5454.57 | -5.70 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 22017.83 | 20817.97 | -5.45 | 21934.65 | 21356.90 | -2.63 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 11414.27 | 11044.86 | -3.24 | 11454.35 | 11140.34 | -2.74 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 5786.64 | 5634.05 | -2.64 | 5724.93 | 5639.99 | -1.48 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 21754.77 | 21466.01 | -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 | 5594.33 | 5544.25 | -0.90 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 30247.03 | 30179.41 | -0.22 | 1538.75 | 3975.82 | 158.38 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 15988.73 | 15621.42 | -2.30 | 776.04 | 1910.91 | 146.24 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 8115.84 | 8025.28 | -1.12 | 389.12 | 984.46 | 152.99 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 30110.91 | 30200.69 | 0.30 | 1532.49 | 3983.77 | 159.95 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 15957.90 | 15690.73 | -1.67 | 774.90 | 1931.00 | 149.19 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 8113.26 | 8037.93 | -0.93 | 391.90 | 965.53 | 146.37 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 29816.97 | 29891.54 | 0.25 | 1538.12 | 3881.93 | 152.38 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 15405.95 | 15619.17 | 1.38 | 762.49 | 1871.00 | 145.38 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 7919.80 | 7957.35 | 0.47 | 393.63 | 972.49 | 147.06 Jatin Bhateja has updated the pull request incrementally with three additional commits since the last revision: - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into JDK-8266054 - 8266054: Code reorganization for efficient sharing of logic to check rotate operation support on a target platform. - 8266054: Removing redundant test templates. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3720/files - new: https://git.openjdk.java.net/jdk/pull/3720/files/ef46c0a8..7969f19f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=04-05 Stats: 76 lines in 3 files changed: 28 ins; 43 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/3720.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3720/head:pull/3720 PR: https://git.openjdk.java.net/jdk/pull/3720 From coleenp at openjdk.java.net Mon May 17 12:55:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 17 May 2021 12:55:41 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses [v2] In-Reply-To: References: Message-ID: On Thu, 13 May 2021 09:36:40 GMT, Vladimir Ivanov wrote: >> Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. >> >> Found 3 occurrences: >> - `Dependencies::find_finalizable_subclass()` >> - `reinitialize_vtable_of()` >> - `VM_RedefineClasses::increment_class_counter()` >> >> Testing: >> - [x] hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > JFR good also! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3995 From vladimir.x.ivanov at oracle.com Mon May 17 13:00:03 2021 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 17 May 2021 16:00:03 +0300 Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v2] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> <9Tv567jC1GsqY3tq4-_HqAxPzHbl9ckDsfIFN3bMvMw=.b73f9786-417d-4260-a569-28ad7ed504ab@github.com> Message-ID: <322e34bb-90ad-a8cc-595b-93369a94e201@oracle.com> >> Ok, fair enough. We can revisit that later and merge them if needed. >> Some suggestions to consider to align it with `reductionCoerced`: >> >> * reflect in the name that it's effectively a reduction, but on masks (`maskReductionCoerced`?); >> * return type can be generalized to `long`; > > Hi @iwanowww, Can you kindly elaborate why should the return type be long here ? > We will need to again downcast it to integer since these APIs return an integer value. FTR downcasts are fine here. In the context of JVM intrinsics the main question is what carrier type to pick. If you don't envision any future operations on masks to return 64-bit values, then it's fine to pick int. Otherwise, it's better to start with long. Because when such operation is introduced, return type (and all use sites) will have to be adjusted anyway (instead of introducing yet another intrinsic method). Best regards, Vladimir Ivanov >> * bound on M: ``; >> * no need to introduce a special interface, `Function` just works: `VectorMaskOp` -> `Function`; > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/3916 > From erikj at openjdk.java.net Mon May 17 13:03:42 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Mon, 17 May 2021 13:03:42 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4] In-Reply-To: References: Message-ID: On Sat, 15 May 2021 02:06:29 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add missing Lib.gmk Build changes look good. ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3638 From thartmann at openjdk.java.net Mon May 17 13:04:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 17 May 2021 13:04:42 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4053 From thartmann at openjdk.java.net Mon May 17 13:05:14 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 17 May 2021 13:05:14 GMT Subject: RFR: 8266615: C2 incorrectly folds subtype checks involving an interface array Message-ID: C2 incorrectly folds the subtype checks in `TestInterfaceArraySubtypeCheck::test1/test2`. As a result, an unexpected `ClassCastException` is thrown at `checkcast` and `instanceof` returns a wrong result. The problem is in `Compile::static_subtype_check` where we incorrectly return `SSC_always_false` for the `MyInterface[] <: MyClassA[]` check because `MyClassA[]` is not a subtype of `MyInterface[]` (after checking that `MyInterface[]` is not a subtype of `MyClassA[]`). The fix is to check that `subelem` is not an interface. This is very old code and not a recent regression. Thanks, Tobias ------------- Commit messages: - 8266615: C2 incorrectly folds subtype checks involving an interface array Changes: https://git.openjdk.java.net/jdk/pull/4060/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4060&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266615 Stats: 87 lines in 2 files changed: 85 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4060.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4060/head:pull/4060 PR: https://git.openjdk.java.net/jdk/pull/4060 From hshi at openjdk.java.net Mon May 17 13:22:41 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Mon, 17 May 2021 13:22:41 GMT Subject: RFR: 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use" In-Reply-To: References: Message-ID: On Mon, 17 May 2021 11:44:48 GMT, Hui Shi wrote: > ? crash with "no reachable node should have no use" > > Please help reivew this fix. > > StrIntrinsicNode::Ideal uses Node::set_req to replace memory input, old memory input might have 0 use, but not added into PhaseGVN worklist. Using set_req_X to ensure add 0 out old memory input node into PhaseGVN worklist. > > Find other two similar problemtic code in LoadNode::Ideal. > > Tier1/2/3 pass with release/fastdebug build. > test/jdk/java/util/Collections/FindSubList.java doesn't fail in 100 runs (before fix 2/3 failure in 10 runs). pre-submit tests have been passed https://github.com/huishi-hs/jdk/actions/runs/849445331 checking here still not completed. ignore. ------------- PR: https://git.openjdk.java.net/jdk/pull/4055 From jbhateja at openjdk.java.net Mon May 17 13:23:36 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 17 May 2021 13:23:36 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 12:06:33 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(VectorOperators.LSHR, n) >> res = lanewise(VectorOperations.OR, vec1 , vec2) >> >> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. >> >> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. >> >> Please find below the performance data for included JMH benchmark. >> Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> Benchmark | (TESTSIZE) | Shift | Baseline AVX3 (ops/ms) | Withopt? AVX3 (ops/ms) | Gain % | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain % >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> ? | ? | ? | ? | ? | ? | ? | ? | ? >> RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 17223.35 | 17094.69 | -0.75 | 17008.32 | 17488.06 | 2.82 >> RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 8944.98 | 8811.34 | -1.49 | 8878.17 | 9218.68 | 3.84 >> RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 17195.75 | 17137.32 | -0.34 | 16789.01 | 17780.34 | 5.90 >> RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 9052.67 | 8838.60 | -2.36 | 8814.62 | 9206.01 | 4.44 >> RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 17100.19 | 16950.64 | -0.87 | 16827.73 | 17720.37 | 5.30 >> RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 9079.95 | 8471.26 | -6.70 | 8888.44 | 9167.68 | 3.14 >> RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 21231.33 | 21513.08 | 1.33 | 21824.51 | 21479.48 | -1.58 >> RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 11103.62 | 11180.16 | 0.69 | 11173.67 | 11529.22 | 3.18 >> RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 21119.14 | 21552.04 | 2.05 | 21693.05 | 21915.37 | 1.02 >> RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 11048.68 | 11094.20 | 0.41 | 11049.90 | 11439.07 | 3.52 >> RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 21506.31 | 21391.41 | -0.53 | 21263.18 | 21986.29 | 3.40 >> RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 11056.12 | 11232.78 | 1.60 | 10941.59 | 11397.09 | 4.16 >> RotateBenchmark.testRotateLeftB | 512.00 | 7.00 | 17976.56 | 18180.85 | 1.14 | 1212.26 | 2533.34 | 108.98 >> RotateBenchmark.testRotateLeftB | 512.00 | 15.00 | 17553.70 | 18219.07 | 3.79 | 1256.73 | 2537.41 | 101.91 >> RotateBenchmark.testRotateLeftB | 512.00 | 31.00 | 17618.03 | 17738.15 | 0.68 | 1214.69 | 2533.83 | 108.60 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 7258.87 | 7468.88 | 2.89 | 7115.12 | 7117.26 | 0.03 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 3586.65 | 3950.85 | 10.15 | 3532.17 | 3595.80 | 1.80 >> RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 1835.07 | 1999.68 | 8.97 | 1789.90 | 1819.93 | 1.68 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 7273.36 | 7410.91 | 1.89 | 7198.60 | 6994.79 | -2.83 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 3674.98 | 3926.27 | 6.84 | 3549.90 | 3755.09 | 5.78 >> RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 1840.94 | 1882.25 | 2.24 | 1801.56 | 1872.89 | 3.96 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 7457.11 | 7361.48 | -1.28 | 6975.33 | 7385.94 | 5.89 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 3570.74 | 3929.30 | 10.04 | 3635.37 | 3736.67 | 2.79 >> RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 1902.32 | 1960.46 | 3.06 | 1812.32 | 1813.88 | 0.09 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 11174.24 | 12044.52 | 7.79 | 11509.87 | 11273.44 | -2.05 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 5981.47 | 6073.70 | 1.54 | 5593.66 | 5661.93 | 1.22 >> RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 2932.49 | 3069.54 | 4.67 | 2950.86 | 2892.42 | -1.98 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 11764.11 | 12098.63 | 2.84 | 11069.52 | 11476.93 | 3.68 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 5855.20 | 6080.40 | 3.85 | 5919.11 | 5607.04 | -5.27 >> RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 2989.05 | 3048.56 | 1.99 | 2902.63 | 2821.83 | -2.78 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 11652.84 | 11965.40 | 2.68 | 11525.62 | 11459.83 | -0.57 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 5851.82 | 6164.94 | 5.35 | 5882.60 | 5842.30 | -0.69 >> RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 3015.99 | 3043.79 | 0.92 | 2963.71 | 2947.97 | -0.53 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 16029.15 | 16189.79 | 1.00 | 860.43 | 2339.32 | 171.88 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 8078.25 | 8081.84 | 0.04 | 427.39 | 1147.92 | 168.59 >> RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 4021.49 | 4294.03 | 6.78 | 209.25 | 582.28 | 178.27 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 15912.98 | 16329.03 | 2.61 | 848.23 | 2296.78 | 170.77 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 8054.10 | 8306.37 | 3.13 | 429.93 | 1146.90 | 166.77 >> RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 4102.58 | 4071.08 | -0.77 | 217.86 | 582.20 | 167.24 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 16177.79 | 16287.85 | 0.68 | 857.84 | 2243.15 | 161.49 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 8187.47 | 8410.48 | 2.72 | 434.60 | 1128.20 | 159.60 >> RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 4109.15 | 4233.80 | 3.03 | 208.71 | 572.43 | 174.27 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 3755.09 | 3930.29 | 4.67 | 3604.19 | 3598.47 | -0.16 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 1829.03 | 1957.39 | 7.02 | 1833.95 | 1808.38 | -1.39 >> RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 915.35 | 970.55 | 6.03 | 916.25 | 899.08 | -1.87 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 3664.85 | 3812.26 | 4.02 | 3629.37 | 3579.23 | -1.38 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 1829.51 | 1877.76 | 2.64 | 1781.05 | 1807.57 | 1.49 >> RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 913.37 | 953.42 | 4.38 | 912.26 | 908.73 | -0.39 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 3648.45 | 3899.20 | 6.87 | 3552.67 | 3581.04 | 0.80 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 1816.50 | 1959.68 | 7.88 | 1820.88 | 1819.71 | -0.06 >> RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 901.05 | 955.13 | 6.00 | 913.74 | 907.90 | -0.64 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 5850.99 | 6108.64 | 4.40 | 5882.65 | 5755.21 | -2.17 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 2962.21 | 3060.47 | 3.32 | 2955.20 | 2909.18 | -1.56 >> RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 1480.46 | 1534.72 | 3.66 | 1467.78 | 1430.60 | -2.53 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 5858.23 | 6047.51 | 3.23 | 5770.02 | 5773.19 | 0.05 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 2951.49 | 3096.53 | 4.91 | 2885.21 | 2899.31 | 0.49 >> RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 1486.26 | 1527.94 | 2.80 | 1441.93 | 1454.25 | 0.85 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 5873.21 | 6089.75 | 3.69 | 5767.58 | 5664.11 | -1.79 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 2969.67 | 3081.39 | 3.76 | 2878.50 | 2905.86 | 0.95 >> RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 1452.21 | 1520.03 | 4.67 | 1430.30 | 1485.63 | 3.87 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 8088.65 | 8443.63 | 4.39 | 455.67 | 1226.33 | 169.13 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 4011.95 | 4120.25 | 2.70 | 229.77 | 619.87 | 169.77 >> RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 2090.57 | 2109.53 | 0.91 | 115.21 | 310.36 | 169.37 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 8166.84 | 8557.28 | 4.78 | 457.67 | 1242.86 | 171.56 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 4137.02 | 4287.95 | 3.65 | 227.26 | 624.80 | 174.93 >> RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 2095.01 | 2102.86 | 0.37 | 114.26 | 310.83 | 172.03 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 8082.68 | 8400.56 | 3.93 | 459.59 | 1230.07 | 167.64 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 4047.67 | 4147.58 | 2.47 | 229.01 | 606.38 | 164.78 >> RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 2086.83 | 2126.72 | 1.91 | 111.93 | 305.66 | 173.08 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 13597.19 | 13255.09 | -2.52 | 13818.39 | 13242.40 | -4.17 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 7028.26 | 6826.59 | -2.87 | 6765.15 | 6907.87 | 2.11 >> RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 3570.40 | 3468.01 | -2.87 | 3449.66 | 3533.50 | 2.43 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 13615.99 | 13464.40 | -1.11 | 13330.02 | 13870.57 | 4.06 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 7043.31 | 6763.34 | -3.97 | 6928.88 | 7063.57 | 1.94 >> RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 3495.12 | 3537.62 | 1.22 | 3503.41 | 3457.67 | -1.31 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 13591.66 | 13665.84 | 0.55 | 13773.27 | 13126.08 | -4.70 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 7027.08 | 7011.24 | -0.23 | 6974.98 | 6815.50 | -2.29 >> RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 3568.28 | 3569.62 | 0.04 | 3580.67 | 3463.58 | -3.27 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 21154.03 | 21416.32 | 1.24 | 21187.01 | 21401.61 | 1.01 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 11194.24 | 10865.47 | -2.94 | 11063.19 | 10977.60 | -0.77 >> RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 5797.80 | 5523.94 | -4.72 | 5654.63 | 5468.78 | -3.29 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 21333.89 | 21412.74 | 0.37 | 21610.94 | 20908.96 | -3.25 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 11327.07 | 11113.48 | -1.89 | 11148.25 | 10678.14 | -4.22 >> RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 5810.69 | 5569.72 | -4.15 | 5663.26 | 5618.87 | -0.78 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 21753.20 | 21198.43 | -2.55 | 21567.90 | 21929.81 | 1.68 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 11517.08 | 11039.64 | -4.15 | 11103.08 | 10871.59 | -2.08 >> RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 5897.16 | 5606.75 | -4.92 | 5459.87 | 5604.12 | 2.64 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 29748.53 | 28883.73 | -2.91 | 1549.02 | 3928.53 | 153.61 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 15197.09 | 15878.19 | 4.48 | 772.59 | 1924.35 | 149.08 >> RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 8046.30 | 8081.19 | 0.43 | 388.11 | 990.28 | 155.16 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 30618.04 | 29419.19 | -3.92 | 1524.22 | 3915.97 | 156.92 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 15854.43 | 15846.37 | -0.05 | 766.09 | 1953.60 | 155.01 >> RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 7814.77 | 7899.30 | 1.08 | 390.82 | 970.37 | 148.29 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 29596.82 | 28538.69 | -3.58 | 1530.45 | 3906.91 | 155.28 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 15662.48 | 15849.25 | 1.19 | 778.08 | 1934.31 | 148.60 >> RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 8121.14 | 7758.59 | -4.46 | 392.78 | 959.73 | 144.34 >> RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 17465.84 | 17069.34 | -2.27 | 16849.73 | 17842.08 | 5.89 >> RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 9049.19 | 8864.15 | -2.04 | 8786.67 | 9105.34 | 3.63 >> RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 17703.38 | 17070.98 | -3.57 | 16595.85 | 17784.68 | 7.16 >> RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 9007.68 | 8817.41 | -2.11 | 8704.49 | 9185.87 | 5.53 >> RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 17531.05 | 16983.40 | -3.12 | 16947.69 | 17655.40 | 4.18 >> RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 8986.30 | 8794.15 | -2.14 | 8816.62 | 9225.95 | 4.64 >> RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 21293.95 | 21506.74 | 1.00 | 21163.29 | 21854.03 | 3.26 >> RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 11258.47 | 11072.92 | -1.65 | 11118.12 | 11338.96 | 1.99 >> RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 21253.36 | 21292.37 | 0.18 | 21224.39 | 21763.88 | 2.54 >> RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 11064.80 | 11198.35 | 1.21 | 10960.98 | 11294.14 | 3.04 >> RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 21358.14 | 21346.21 | -0.06 | 21487.25 | 21854.42 | 1.71 >> RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 11045.61 | 11208.26 | 1.47 | 10907.03 | 11415.18 | 4.66 >> RotateBenchmark.testRotateRightB | 512.00 | 7.00 | 17898.61 | 18307.54 | 2.28 | 1214.65 | 2546.64 | 109.66 >> RotateBenchmark.testRotateRightB | 512.00 | 15.00 | 17909.25 | 18242.51 | 1.86 | 1215.05 | 2563.98 | 111.02 >> RotateBenchmark.testRotateRightB | 512.00 | 31.00 | 17883.35 | 17928.44 | 0.25 | 1220.77 | 2543.30 | 108.34 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 7139.97 | 7626.72 | 6.82 | 6994.86 | 7075.65 | 1.15 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 3657.37 | 3898.34 | 6.59 | 3617.06 | 3576.12 | -1.13 >> RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 1804.26 | 1969.19 | 9.14 | 1796.62 | 1858.84 | 3.46 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 7404.31 | 7760.09 | 4.80 | 7036.77 | 7401.52 | 5.18 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 3600.52 | 3956.35 | 9.88 | 3595.28 | 3560.36 | -0.97 >> RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 1813.32 | 1966.41 | 8.44 | 1839.95 | 1852.53 | 0.68 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 7118.48 | 7724.81 | 8.52 | 7151.56 | 7021.09 | -1.82 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 3529.70 | 3881.63 | 9.97 | 3623.08 | 3601.01 | -0.61 >> RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 1823.61 | 1961.34 | 7.55 | 1786.86 | 1748.85 | -2.13 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 11697.98 | 11835.25 | 1.17 | 11513.16 | 11184.87 | -2.85 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 5890.11 | 6102.57 | 3.61 | 5658.79 | 5696.08 | 0.66 >> RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 2964.94 | 3070.26 | 3.55 | 2945.00 | 2962.08 | 0.58 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 11562.51 | 12151.29 | 5.09 | 11404.17 | 11120.28 | -2.49 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 5702.93 | 6130.57 | 7.50 | 5799.54 | 5779.08 | -0.35 >> RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 2861.96 | 3051.44 | 6.62 | 2943.99 | 2860.65 | -2.83 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 11203.13 | 11710.59 | 4.53 | 11363.18 | 11112.16 | -2.21 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 5893.97 | 6070.71 | 3.00 | 5776.67 | 5648.84 | -2.21 >> RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 2971.83 | 3046.76 | 2.52 | 2903.35 | 2833.88 | -2.39 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 16064.71 | 15851.35 | -1.33 | 861.93 | 2256.88 | 161.84 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 7916.80 | 8462.65 | 6.89 | 430.23 | 1147.30 | 166.67 >> RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 4104.64 | 4068.28 | -0.89 | 216.30 | 572.86 | 164.84 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 16133.09 | 16281.59 | 0.92 | 856.36 | 2229.58 | 160.35 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 8127.26 | 8117.59 | -0.12 | 419.16 | 1176.42 | 180.66 >> RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 4080.11 | 4063.26 | -0.41 | 218.32 | 571.93 | 161.97 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 15834.26 | 16314.64 | 3.03 | 865.96 | 2297.74 | 165.34 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 7965.62 | 8270.48 | 3.83 | 428.55 | 1148.87 | 168.08 >> RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 4161.69 | 4034.76 | -3.05 | 215.63 | 570.19 | 164.43 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 3556.70 | 3877.08 | 9.01 | 3596.46 | 3558.32 | -1.06 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 1772.93 | 1993.86 | 12.46 | 1856.79 | 1783.22 | -3.96 >> RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 908.66 | 1000.37 | 10.09 | 944.79 | 922.91 | -2.32 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 3742.44 | 3748.41 | 0.16 | 3788.07 | 3570.67 | -5.74 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 1817.53 | 1985.69 | 9.25 | 1892.38 | 1833.16 | -3.13 >> RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 941.03 | 952.68 | 1.24 | 915.79 | 910.21 | -0.61 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 3649.48 | 3896.56 | 6.77 | 3637.59 | 3557.53 | -2.20 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 1840.12 | 1997.19 | 8.54 | 1821.47 | 1799.82 | -1.19 >> RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 901.33 | 995.67 | 10.47 | 909.20 | 902.73 | -0.71 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 5789.93 | 5960.54 | 2.95 | 5758.14 | 5736.30 | -0.38 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 2963.20 | 3063.30 | 3.38 | 2943.48 | 2833.84 | -3.72 >> RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 1501.81 | 1510.23 | 0.56 | 1463.85 | 1462.26 | -0.11 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 5870.05 | 5951.43 | 1.39 | 5794.74 | 5604.58 | -3.28 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 2971.36 | 3047.00 | 2.55 | 2931.19 | 2907.30 | -0.82 >> RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 1473.97 | 1530.54 | 3.84 | 1473.45 | 1442.40 | -2.11 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 5858.08 | 6080.49 | 3.80 | 5863.69 | 5549.85 | -5.35 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 2916.24 | 3045.77 | 4.44 | 2981.59 | 2815.07 | -5.58 >> RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 1441.20 | 1531.56 | 6.27 | 1492.47 | 1473.25 | -1.29 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 8147.24 | 8310.05 | 2.00 | 469.45 | 1235.21 | 163.12 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 4142.95 | 4258.86 | 2.80 | 234.14 | 615.52 | 162.88 >> RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 2095.48 | 2087.20 | -0.40 | 113.55 | 311.19 | 174.05 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 8222.94 | 8246.58 | 0.29 | 458.91 | 1244.32 | 171.15 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 4160.04 | 4226.46 | 1.60 | 227.78 | 625.38 | 174.56 >> RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 2064.63 | 2162.44 | 4.74 | 113.27 | 314.15 | 177.36 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 8157.94 | 8466.90 | 3.79 | 450.26 | 1221.90 | 171.37 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 4039.74 | 4283.33 | 6.03 | 224.82 | 612.68 | 172.53 >> RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 2066.88 | 2147.51 | 3.90 | 110.97 | 303.43 | 173.42 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 13548.39 | 13245.87 | -2.23 | 13490.93 | 13084.76 | -3.01 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 7020.16 | 6768.85 | -3.58 | 6991.39 | 7044.32 | 0.76 >> RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 3550.50 | 3505.19 | -1.28 | 3507.12 | 3612.86 | 3.01 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 13743.43 | 13325.44 | -3.04 | 13696.15 | 13255.80 | -3.22 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 6856.02 | 6969.18 | 1.65 | 6886.29 | 6834.12 | -0.76 >> RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 3569.53 | 3492.76 | -2.15 | 3539.02 | 3470.02 | -1.95 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 13704.18 | 13495.07 | -1.53 | 13649.14 | 13583.87 | -0.48 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 7011.77 | 6953.93 | -0.82 | 6978.28 | 6740.30 | -3.41 >> RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 3591.62 | 3620.12 | 0.79 | 3502.04 | 3510.05 | 0.23 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 21950.71 | 22113.60 | 0.74 | 21484.27 | 21596.64 | 0.52 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 11616.88 | 11099.73 | -4.45 | 11188.29 | 10737.68 | -4.03 >> RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 5872.72 | 5579.12 | -5.00 | 5784.05 | 5454.57 | -5.70 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 22017.83 | 20817.97 | -5.45 | 21934.65 | 21356.90 | -2.63 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 11414.27 | 11044.86 | -3.24 | 11454.35 | 11140.34 | -2.74 >> RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 5786.64 | 5634.05 | -2.64 | 5724.93 | 5639.99 | -1.48 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 21754.77 | 21466.01 | -1.33 | 21140.67 | 21970.03 | 3.92 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | -2.72 | 11204.90 | 11213.48 | 0.08 >> RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 | 5594.33 | 5544.25 | -0.90 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 30247.03 | 30179.41 | -0.22 | 1538.75 | 3975.82 | 158.38 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 15988.73 | 15621.42 | -2.30 | 776.04 | 1910.91 | 146.24 >> RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 8115.84 | 8025.28 | -1.12 | 389.12 | 984.46 | 152.99 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 30110.91 | 30200.69 | 0.30 | 1532.49 | 3983.77 | 159.95 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 15957.90 | 15690.73 | -1.67 | 774.90 | 1931.00 | 149.19 >> RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 8113.26 | 8037.93 | -0.93 | 391.90 | 965.53 | 146.37 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 29816.97 | 29891.54 | 0.25 | 1538.12 | 3881.93 | 152.38 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 15405.95 | 15619.17 | 1.38 | 762.49 | 1871.00 | 145.38 >> RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 7919.80 | 7957.35 | 0.47 | 393.63 | 972.49 | 147.06 > > Jatin Bhateja has updated the pull request incrementally with three additional commits since the last revision: > > - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into JDK-8266054 > - 8266054: Code reorganization for efficient sharing of logic to check rotate operation support on a target platform. > - 8266054: Removing redundant test templates. Hi @iwanowww, @neliasso, can you kindly review compiler side changes and share your feedback. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720 From vlivanov at openjdk.java.net Mon May 17 13:51:44 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 17 May 2021 13:51:44 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v4] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: <6sgoak9hLqP2gY_tFdDNmwpd1WmCdbQxgu_s6PEhHSo=.0364c120-34fd-4792-878d-d09e9673b92f@github.com> On Mon, 17 May 2021 08:39:22 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Review comments resolution. > Byte512Vector mandates the presence of AVX512BW as enforced by Matcher::match_rule_supported_vector()) thus removed the special code sequence for 512 bit vector in absence of AVX512BW feature. Please, elaborate why matters `Byte512Vector` here? Intrinsics are fed with corresponding vector element type, so unconditionally refecting AVX512F case (w/ BW & VL absent) means that on Xeon Phis `VectorMask.lastTrue/firstTrue/trueCont` on 512-bit masks are useless (irrespective of element type) while some 512-bit vector shapes are supported. Is it intended? src/hotspot/share/opto/vectorIntrinsics.cpp line 432: > 430: BasicType elem_bt = elem_type->basic_type(); > 431: > 432: if (num_elem <= 2) { You mentioned that masks of length 2 are supported, but it's rejected here. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From jbhateja at openjdk.java.net Mon May 17 14:16:50 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 17 May 2021 14:16:50 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v4] In-Reply-To: <6sgoak9hLqP2gY_tFdDNmwpd1WmCdbQxgu_s6PEhHSo=.0364c120-34fd-4792-878d-d09e9673b92f@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> <6sgoak9hLqP2gY_tFdDNmwpd1WmCdbQxgu_s6PEhHSo=.0364c120-34fd-4792-878d-d09e9673b92f@github.com> Message-ID: On Mon, 17 May 2021 13:48:34 GMT, Vladimir Ivanov wrote: > > Byte512Vector mandates the presence of AVX512BW as enforced by Matcher::match_rule_supported_vector()) thus removed the special code sequence for 512 bit vector in absence of AVX512BW feature. > > Please, elaborate why matters `Byte512Vector` here? > > Intrinsics are fed with corresponding vector element type, so unconditionally refecting AVX512F case (w/ BW & VL absent) means that on Xeon Phis `VectorMask.lastTrue/firstTrue/trueCont` on 512-bit masks are useless (irrespective of element type) while some 512-bit vector shapes are supported. Is it intended? This is being enforced by Matcher::match_rule_supported_vector(), for a 512 bit vector of sub-word type is supported only if target supports AVX512BW. For other types apart from sub-word types a 512 bit vector mask will be handled by the second instruction selection pattern which is predicated by !VM_Version::supports_avx512vlbw() since for them maximum vector size needed to hold the byte vector containing mask will always be <= 32 bytes. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From psandoz at openjdk.java.net Mon May 17 15:32:43 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 17 May 2021 15:32:43 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v3] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: <2Hq64WSZ6ulPHDZi9RC1sfOZPG6BB94G7NQDReQOPjM=.b1e01864-1d91-4516-b430-947916c139f5@github.com> On Mon, 17 May 2021 11:25:34 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) >> -- | -- | -- | -- | -- | -- | -- >> ? | ? | ? | ? | ? | ? | ? >> ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 >> ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 >> ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 >> ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 >> ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 >> ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 >> ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 >> ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 >> ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 >> ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 >> ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 >> ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 >> ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 >> ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 >> ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 >> ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 >> ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 >> ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 >> ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 >> ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 >> ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 >> ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 >> ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 >> ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 >> ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 >> ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 >> ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 >> ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 >> ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 >> ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 >> ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 >> ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 >> ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 >> ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 >> ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 >> ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 >> ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 >> ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 >> ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 >> ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 >> ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 >> ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 >> ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 >> ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 >> ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 >> ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 >> ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 >> ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 >> ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 >> ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 >> ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 >> ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 >> ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 >> ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 >> ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 >> ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 >> ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 >> ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 >> ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 >> ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 >> ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 >> ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 >> ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 >> ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 >> ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 >> ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 >> ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 >> ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 >> ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 >> ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 >> ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 >> ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 >> ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 >> ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 >> ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 >> ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 >> ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 >> ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 >> ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 >> ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 >> ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 >> ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 >> ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 >> ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 >> ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 >> ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 >> ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 >> ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 >> ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 >> ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 >> ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 >> ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 >> ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 >> ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 >> ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 >> ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 >> ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 >> ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 >> ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 >> ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 >> ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 >> ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 >> ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 >> ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 >> ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 >> ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 >> ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 >> ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 >> ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 >> ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 >> ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 >> ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 >> ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 >> ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 >> ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 >> ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 >> ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 >> ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 >> ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 >> ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 >> ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 >> ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 >> ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 >> ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 >> ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 >> ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 >> ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 >> ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 >> ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 >> ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 >> ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 >> ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 >> ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 >> ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 >> ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 >> ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 >> ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 >> ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 >> ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 >> ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 >> ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 >> ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 >> ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 >> ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 >> ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 >> ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 >> ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 >> ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 >> ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 >> ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 >> ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 >> ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 >> ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 >> ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 >> ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 >> ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 >> ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 >> ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 >> ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 >> ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 >> ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 >> ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 >> ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 >> ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 >> ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 >> ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 >> ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 >> ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 >> ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 >> ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 >> ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 >> ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 >> ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 >> ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 >> ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Enable partial in-lining if UsePartialInlineSize=64, adding a benchmark for small sized conversions of various primitive types. test/micro/org/openjdk/bench/java/util/ArraysMismatch.java line 65: > 63: leftStartRange = size / 4; > 64: leftEndRange = size - size / 4; > 65: rightStartRange = size / 4 + 1; Since you changed `10` to `1` perhaps make this a parameter defaulting to the new value? ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From psandoz at openjdk.java.net Mon May 17 15:37:50 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 17 May 2021 15:37:50 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v3] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Mon, 17 May 2021 11:25:34 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline) >> -- | -- | -- | -- | -- | -- | -- >> ? | ? | ? | ? | ? | ? | ? >> ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694 >> ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217 >> ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655 >> ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427 >> ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574 >> ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586 >> ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989 >> ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055 >> ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675 >> ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846 >> ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003 >> ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904 >> ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201 >> ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545 >> ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716 >> ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504 >> ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351 >> ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825 >> ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765 >> ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451 >> ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684 >> ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993 >> ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601 >> ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864 >> ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769 >> ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088 >> ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079 >> ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655 >> ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783 >> ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767 >> ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637 >> ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014 >> ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453 >> ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121 >> ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458 >> ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004 >> ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647 >> ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794 >> ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233 >> ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061 >> ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712 >> ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992 >> ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568 >> ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937 >> ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819 >> ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078 >> ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459 >> ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318 >> ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961 >> ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831 >> ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491 >> ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111 >> ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332 >> ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648 >> ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192 >> ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122 >> ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079 >> ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232 >> ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272 >> ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574 >> ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144 >> ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023 >> ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391 >> ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916 >> ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987 >> ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903 >> ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558 >> ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049 >> ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116 >> ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362 >> ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036 >> ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753 >> ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875 >> ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162 >> ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777 >> ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885 >> ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874 >> ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809 >> ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587 >> ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508 >> ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709 >> ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321 >> ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112 >> ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343 >> ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897 >> ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602 >> ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955 >> ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514 >> ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658 >> ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289 >> ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284 >> ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716 >> ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226 >> ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545 >> ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576 >> ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657 >> ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014 >> ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682 >> ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448 >> ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539 >> ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771 >> ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021 >> ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293 >> ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319 >> ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334 >> ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468 >> ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291 >> ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806 >> ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622 >> ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475 >> ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002 >> ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613 >> ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726 >> ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381 >> ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847 >> ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407 >> ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543 >> ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222 >> ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728 >> ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831 >> ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002 >> ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382 >> ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417 >> ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065 >> ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698 >> ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232 >> ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243 >> ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012 >> ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689 >> ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966 >> ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703 >> ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778 >> ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646 >> ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511 >> ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829 >> ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583 >> ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938 >> ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815 >> ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761 >> ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547 >> ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319 >> ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397 >> ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946 >> ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334 >> ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046 >> ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895 >> ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377 >> ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941 >> ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575 >> ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208 >> ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526 >> ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464 >> ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894 >> ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768 >> ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635 >> ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911 >> ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242 >> ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858 >> ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817 >> ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324 >> ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957 >> ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143 >> ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685 >> ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645 >> ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504 >> ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355 >> ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773 >> ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059 >> ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884 >> ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306 >> ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689 >> ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911 >> ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771 >> ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464 >> ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Enable partial in-lining if UsePartialInlineSize=64, adding a benchmark for small sized conversions of various primitive types. Thanks for investigating further. I agree with your assessment. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From psandoz at openjdk.java.net Mon May 17 15:37:51 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 17 May 2021 15:37:51 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v3] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Fri, 14 May 2021 11:28:10 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/c2_globals.hpp line 85: >> >>> 83: range(0, max_jint) \ >>> 84: \ >>> 85: product(intx, UsePartialInlineSize, -1, DIAGNOSTIC, \ >> >> Unsure if the name change requires a CSR. Members of HotSpot can advise. >> >> Also, please check for any tests that might use this flag. > > -XX:UsePartialInlineSize is a diagnostic option and not a product option. Thus CSR may not be relevant for this case. Yes, not needed: https://wiki.openjdk.java.net/display/HotSpot/Hotspot+Command-line+Flags%3A+Kinds%2C+Lifecycle+and+the+CSR+Process ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From aph at openjdk.java.net Mon May 17 16:01:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 17 May 2021 16:01:43 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v9] In-Reply-To: References: Message-ID: <4KqXJjwOAZfYQGe-guwozZskn8lf2RR7oTBu5aHSUQo=.ca01baa1-601b-4cab-b546-134173ce4ce9@github.com> On Mon, 17 May 2021 05:47:39 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> For the following benchmark: >> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java >> >> The optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add jmh test for Adler32 I'm not a lawyer, but Pengfei, please contribute this benchmark. All you have to do is copy it into cr.openjdk.java.net. That should be enough for someone else to take it from there. And AFAICR files should have a copyright header, which you should do too. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 17 16:47:49 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 17 May 2021 16:47:49 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v9] In-Reply-To: References: Message-ID: <2rU8Od9JNk3hPyz6KQ2pGo-zo_Nkc0ZAviuKW0DeGGA=.830d1a33-99b2-466e-9db3-dec8ccfb0774@github.com> On Mon, 17 May 2021 05:47:39 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). >> >> For this benchmark, the optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add jmh test for Adler32 The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 17 16:47:49 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 17 May 2021 16:47:49 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v9] In-Reply-To: <4KqXJjwOAZfYQGe-guwozZskn8lf2RR7oTBu5aHSUQo=.ca01baa1-601b-4cab-b546-134173ce4ce9@github.com> References: <4KqXJjwOAZfYQGe-guwozZskn8lf2RR7oTBu5aHSUQo=.ca01baa1-601b-4cab-b546-134173ce4ce9@github.com> Message-ID: On Mon, 17 May 2021 15:58:21 GMT, Andrew Haley wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add jmh test for Adler32 > > I'm not a lawyer, but Pengfei, please contribute this benchmark. All you have to do is copy it into cr.openjdk.java.net. That should be enough for someone else to take it from there. And AFAICR files should have a copyright header, which you should do too. @theRealAph I have given attribution to Pengfei Li for the micro benchmark. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From neliasso at openjdk.java.net Mon May 17 16:55:00 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 17 May 2021 16:55:00 GMT Subject: RFR: 8265262: CITime - 'other' incorrectly calculated Message-ID: This CR fixes a few issues with the CITIme output for C2: 1) The other category for _t_optimize is not removing time spent in _t_vector 2) Some of the _t_incrInline sub counters is called from different contexts - calculating 'other' from total time spent in _t_incrInline expects that the counter usage is strictly hierarchical. 3) I've placed the non-hierarchical counters in braces. 4) Code Installation is a part of Code Emission (_t_output). Indentation fixed. 5) Moved "renumber live" after "Vector" so that they appear in order. 6) Added sub counters "shorten branches" and "fill buffer" to "Code Emission" phase, and added an other category. Before more than 50% of time in Code Emission was unaccounted for, now it's less than 25%. Please review, Best regards, Nils Eliasson ------------- Commit messages: - removed whitespace - Add timers - fix_counters Changes: https://git.openjdk.java.net/jdk/pull/4065/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4065&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265262 Stats: 36 lines in 3 files changed: 24 ins; 3 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/4065.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4065/head:pull/4065 PR: https://git.openjdk.java.net/jdk/pull/4065 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 17 16:56:44 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 17 May 2021 16:56:44 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v7] In-Reply-To: <6U5EDcQli0NUlrhdOBtzkujogfwN_39V3bPs_7oIUX0=.e51f26d8-b503-4680-8d6e-34311a1b300f@github.com> References: <6U5EDcQli0NUlrhdOBtzkujogfwN_39V3bPs_7oIUX0=.e51f26d8-b503-4680-8d6e-34311a1b300f@github.com> Message-ID: On Fri, 14 May 2021 20:17:53 GMT, Vladimir Kozlov wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add @run case > > Ping @pfustc about permission to add his JMH micro or write your own based on examples in `test/micro/org/openjdk/bench/java/util/` @vnkozlov I implemented your review comments. Could you please take a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From redestad at openjdk.java.net Mon May 17 17:11:46 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 17 May 2021 17:11:46 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: <5pFcvZjl047P-j6FFBI09dotTEO4f_FuHVQL_-oAfE8=.1e0b4434-790b-49cc-a9cd-dd6b1ac335d0@github.com> On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux @mychris I'd be happy to sponsor this. Initiate integration by commenting with /integrate ------------- PR: https://git.openjdk.java.net/jdk/pull/4053 From shade at openjdk.java.net Mon May 17 17:14:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 17 May 2021 17:14:40 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: <31H0v6rcDRdvSJviFvDh1PAo4vtfJ-tSrjM7EB_1JL8=.48c41be9-5b89-46c4-a9c3-b285c3eda327@github.com> On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux Um, this is concerning: I would have thought #3947 was an accurate mechanical move from relevant `.ad`-s to relevant `.hpp` / `.cpp`-s. This patch contradicts that thought. So, is someone assigned to comb through #3947 to see if there are other problems like this? ------------- PR: https://git.openjdk.java.net/jdk/pull/4053 From redestad at openjdk.java.net Mon May 17 17:50:36 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 17 May 2021 17:50:36 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux FWIW I've combed through #3947 again but haven't found any additional errors in the translation. ------------- PR: https://git.openjdk.java.net/jdk/pull/4053 From paul.sandoz at oracle.com Mon May 17 17:51:45 2021 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 17 May 2021 17:51:45 +0000 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> Message-ID: <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> Hi Andrew, I?ll let Sandhya talk more about the provenance and numerical accuracy. I think we can add more comments/details in that respect. IMO this is a reasonable compromise, at least for incubation with follow on investigation to determine if we can leverage possible enhancements to Panama FFM (see JEP 414 section on SVML). We would like encourage experimentation of numerical data-parallel algorithms. The performance gains using SVML are compelling in that regard. Note that the code is only accessed by the Vector API and is not present in a JDK image if the incubating vector module is not present. Paul. > On May 15, 2021, at 3:29 AM, Andrew Haley wrote: > > On 4/22/21 11:27 PM, Sandhya Viswanathan wrote: >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > Is this really acceptable code quality for OpenJDK? No comments, no > explanation of the derivation of algorithms, no explanation or proofs > of accuracy. There doesn't even seem to be any source code, just compiler > output. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From shade at openjdk.java.net Mon May 17 17:53:51 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 17 May 2021 17:53:51 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux Okay then, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/4053 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 17 18:57:28 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 17 May 2021 18:57:28 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v10] In-Reply-To: References: Message-ID: > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). > > For this benchmark, the optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: Remove -XX:+UseAdler32Intrinsics, as it will fail on non-supported platforms ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/c8e2ab05..d6a58166 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Mon May 17 19:15:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 19:15:40 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v9] In-Reply-To: <4KqXJjwOAZfYQGe-guwozZskn8lf2RR7oTBu5aHSUQo=.ca01baa1-601b-4cab-b546-134173ce4ce9@github.com> References: <4KqXJjwOAZfYQGe-guwozZskn8lf2RR7oTBu5aHSUQo=.ca01baa1-601b-4cab-b546-134173ce4ce9@github.com> Message-ID: On Mon, 17 May 2021 15:58:21 GMT, Andrew Haley wrote: > I'm not a lawyer, but Pengfei, please contribute this benchmark. All you have to do is copy it into cr.openjdk.java.net. That should be enough for someone else to take it from there. And AFAICR files should have a copyright header, which you should do too. @theRealAph micro is already there for long time: https://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java It missed copyright header which is added in these changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Mon May 17 19:24:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 19:24:41 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v10] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 18:57:28 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). >> >> For this benchmark, the optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Remove -XX:+UseAdler32Intrinsics, as it will fail on non-supported platforms I have 2 comments. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1322: > 1320: Assembler::vpmulld(dst, nds, src, vector_len); > 1321: } > 1322: void vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg = rscratch1); Looks like my comment was lost. I see only last version of method is used in stub. Why you need additional 2 wrapper methods? Also the code always pass `scratch_reg` - you don't need to set default value. src/hotspot/cpu/x86/vm_version_x86.cpp line 907: > 905: } > 906: } else if (UseAdler32Intrinsics) { > 907: if (!FLAG_IS_DEFAULT(UseAdler32Intrinsics)) Add `{}`. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Mon May 17 19:30:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 19:30:41 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v4] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 02:20:37 GMT, Yi Yang wrote: >> It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > caonicalize in interpreter/c1/c2 modes Just one small comment you can fix before push. Otherwise it is good. test/hotspot/jtreg/compiler/c1/CanonicalizeGetModifiers.java line 52: > 50: * @requires vm.compiler2.enabled > 51: * @library /test/lib > 52: * @run main/othervm -XX:TieredStopAtLevel=4 -XX:-TieredCompilation `-XX:TieredStopAtLevel=4` does not work when Tiered is off. You don't need this flag. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3616 From kvn at openjdk.java.net Mon May 17 19:32:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 19:32:41 GMT Subject: RFR: 8267112: JVMCI compiler modules should be kept upgradable In-Reply-To: References: Message-ID: On Thu, 13 May 2021 16:37:38 GMT, Vladimir Kozlov wrote: > [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. > > Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. > > Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. > > Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. > I restored related code in some tests for them to pass. > > Testing: full tier1-tier3. @erikj79, are you okay with these changes? ------------- PR: https://git.openjdk.java.net/jdk/pull/4014 From kvn at openjdk.java.net Mon May 17 19:40:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 19:40:38 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time In-Reply-To: References: Message-ID: On Mon, 17 May 2021 05:23:12 GMT, Hui Shi wrote: > Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. > > Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. > > 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not meet nodes whose _idx is not unique (PhaseIterGVN::optimize is not triggered in the middle of PhaseRenumberLive ). Assertion every node is in current node_arena() in Node::verify, passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. > > 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. > > 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. > > 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. src/hotspot/share/opto/node.cpp line 2249: > 2247: } > 2248: } > 2249: if (cnt == 2) { I think it should be `cnt > 1`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From kvn at openjdk.java.net Mon May 17 19:57:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 19:57:40 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time In-Reply-To: References: Message-ID: <7BWWnQZvM5GFU-tETp5VVGCLdtS6VdM-eoz3Hb5O4cA=.df57c593-8533-4170-854a-48ef6e2dcfc1@github.com> On Mon, 17 May 2021 05:23:12 GMT, Hui Shi wrote: > Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. > > Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. > > 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not meet nodes whose _idx is not unique (PhaseIterGVN::optimize is not triggered in the middle of PhaseRenumberLive ). Assertion every node is in current node_arena() in Node::verify, passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. > > 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. > > 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. > > 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. Changes requested by kvn (Reviewer). test/hotspot/jtreg/compiler/debug/TraceIterativeGVN.java line 27: > 25: /* > 26: * @test > 27: * @requires vm.debug == true & vm.flavor == "server" Use `vm.compiler2.enabled` instead of `vm.flavor == "server"` ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From erikj at openjdk.java.net Mon May 17 20:13:42 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Mon, 17 May 2021 20:13:42 GMT Subject: RFR: 8267112: JVMCI compiler modules should be kept upgradable In-Reply-To: References: Message-ID: On Thu, 13 May 2021 16:37:38 GMT, Vladimir Kozlov wrote: > [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. > > Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. > > Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. > > Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. > I restored related code in some tests for them to pass. > > Testing: full tier1-tier3. Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4014 From kvn at openjdk.java.net Mon May 17 20:13:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 20:13:42 GMT Subject: RFR: 8267112: JVMCI compiler modules should be kept upgradable In-Reply-To: References: Message-ID: On Thu, 13 May 2021 16:37:38 GMT, Vladimir Kozlov wrote: > [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. > > Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. > > Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. > > Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. > I restored related code in some tests for them to pass. > > Testing: full tier1-tier3. Thank you, Erik. ------------- PR: https://git.openjdk.java.net/jdk/pull/4014 From kvn at openjdk.java.net Mon May 17 20:13:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 20:13:43 GMT Subject: Integrated: 8267112: JVMCI compiler modules should be kept upgradable In-Reply-To: References: Message-ID: <90anbXn1snIdlOvyjnIDTOjLGlJAIpktD336jmT6OjU=.3ac10a6b-20e3-4604-bb18-e00b44a75546@github.com> On Thu, 13 May 2021 16:37:38 GMT, Vladimir Kozlov wrote: > [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes removed sources and also removed JVMCI compiler from list of upgradable modules. JVMCI compiler modules should be upgradable in JDK to work with GraalVM. > > Make these modules upgradable again and empty by leaving only reference to JVMCI (jdk.internal.vm.ci) module. It does not restore sources - only `module-info.java` files are kept. > > Note, we continue discussion about [JDK-8265091](https://bugs.openjdk.java.net/browse/JDK-8265091): "Use Module API to export JVMCI packages at runtime" to see if we can remove these `module-info.java` files. > > Changes were proposed by @dougxc after testing [JDK-8264806](https://bugs.openjdk.java.net/browse/JDK-8264806) changes with GraalVM. > I restored related code in some tests for them to pass. > > Testing: full tier1-tier3. This pull request has now been integrated. Changeset: 2effdd1b Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/2effdd1b6799a15a766b2b2a6cba4806d92122f3 Stats: 83 lines in 9 files changed: 34 ins; 42 del; 7 mod 8267112: JVMCI compiler modules should be kept upgradable Reviewed-by: mchung, erikj, dnsimon ------------- PR: https://git.openjdk.java.net/jdk/pull/4014 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 17 20:25:59 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 17 May 2021 20:25:59 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v10] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 19:21:17 GMT, Vladimir Kozlov wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove -XX:+UseAdler32Intrinsics, as it will fail on non-supported platforms > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1322: > >> 1320: Assembler::vpmulld(dst, nds, src, vector_len); >> 1321: } >> 1322: void vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg = rscratch1); > > Looks like my comment was lost. > I see only last version of method is used in stub. Why you need additional 2 wrapper methods? > Also the code always pass `scratch_reg` - you don't need to set default value. I think the first two were introduced by other patches will remove the scratch_reg ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Mon May 17 20:27:35 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 20:27:35 GMT Subject: RFR: 8266615: C2 incorrectly folds subtype checks involving an interface array In-Reply-To: References: Message-ID: On Mon, 17 May 2021 12:55:06 GMT, Tobias Hartmann wrote: > C2 incorrectly folds the subtype checks in `TestInterfaceArraySubtypeCheck::test1/test2`. As a result, an unexpected `ClassCastException` is thrown at `checkcast` and `instanceof` returns a wrong result. The problem is in `Compile::static_subtype_check` where we incorrectly return `SSC_always_false` for the `MyInterface[] <: MyClassA[]` check because `MyClassA[]` is not a subtype of `MyInterface[]` (after checking that `MyInterface[]` is not a subtype of `MyClassA[]`). > > The fix is to check that `subelem` is not an interface. This is very old code and not a recent regression. > > Thanks, > Tobias Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4060 From kvn at openjdk.java.net Mon May 17 22:20:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 22:20:44 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4] In-Reply-To: References: Message-ID: On Sat, 15 May 2021 02:06:29 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add missing Lib.gmk I reviewed only HotSpot part. Main complain is long list of library entry points (vector functions) repeated in several files. May be use macros/lambdas? with loops to reduce number of lines and consolidate in one place. Add comments explaining numbers and letters in names. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7290: > 7288: StubRoutines::_vector_cbrt_float128 = (address)os::dll_lookup(libsvml, "__svml_cbrtf4_ha_ex"); > 7289: StubRoutines::_vector_cbrt_double64 = (address)os::dll_lookup(libsvml, "__svml_cbrt1_ha_ex"); > 7290: StubRoutines::_vector_cbrt_double128 = (address)os::dll_lookup(libsvml, "__svml_cbrt2_ha_ex"); May be use macros with **comments**? In this code is very easy make `copy-paste` errors. And It is very not clear for me what all these numbers and letters mean in names. I can guess but it should be comments. src/hotspot/cpu/x86/x86_64.ad line 1713: > 1711: return OptoRegPair(hi, lo); > 1712: } > 1713: Should these methods check `EnableVectorSupport` flag too? Or `UseVectorStubs`? src/hotspot/share/opto/callnode.cpp line 747: > 745: > 746: // If the return is in vector, compute appropriate regmask taking into account the whole range > 747: if(ideal_reg >= Op_VecS && ideal_reg <= Op_VecZ) { Should this be done only for CallLeafVector? Can `Valhalla` return big object in one of vector registers? src/hotspot/share/opto/vectorIntrinsics.cpp line 353: > 351: if (operation == NULL) { > 352: if (C->print_intrinsics()) { > 353: tty->print_cr(" ** svml call failed"); Also print information about the call which fail. src/hotspot/share/opto/vectorIntrinsics.cpp line 1682: > 1680: default: Unimplemented(); break; > 1681: } > 1682: } Macros? src/hotspot/share/runtime/stubRoutines.cpp line 330: > 328: address StubRoutines::_vector_atan2_double256 = NULL; > 329: address StubRoutines::_vector_atan2_double512 = NULL; > 330: #endif // __VECTOR_API_MATH_INTRINSICS_COMMON Macros? ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3638 From kvn at openjdk.java.net Mon May 17 22:27:02 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 22:27:02 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: References: Message-ID: On Wed, 12 May 2021 05:33:14 GMT, David Holmes wrote: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From kvn at openjdk.java.net Mon May 17 22:27:02 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 17 May 2021 22:27:02 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: References: Message-ID: On Fri, 14 May 2021 16:29:41 GMT, Vladimir Ivanov wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > Overall, it looks very good. > Thanks for taking care of compiler part, David. > > I think it makes sense to remove lir_div_strictfp and lir_mul_strictfp in C1 as well: > https://github.com/openjdk/jdk/pull/4027 > > Feel free to incorporate the patch into the current PR if you agree with the change. > (Passed hs-tier1 - hs-tier4 testing and x86_32 build.) > > Otherwise, I'll handle it as a separate PR. @iwanowww I agree with your suggestion but lets do it in separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From github.com+58006833+xbzhang99 at openjdk.java.net Mon May 17 23:30:51 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 17 May 2021 23:30:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v10] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 20:23:05 GMT, Xubo Zhang wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1322: >> >>> 1320: Assembler::vpmulld(dst, nds, src, vector_len); >>> 1321: } >>> 1322: void vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg = rscratch1); >> >> Looks like my comment was lost. >> I see only last version of method is used in stub. Why you need additional 2 wrapper methods? >> Also the code always pass `scratch_reg` - you don't need to set default value. > > I think the first two were introduced by other patches > will remove the scratch_reg Sorry, I added first two. The vpmulld is overloaded in base Assembler class. If I override one method in MacroAssembler class, the C++ compiler doesn?t seem to find the other overloaded functions, they somehow become hidden. So, I need to override those as well in macroAssembler, otherwise I get the following error: ./src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp: In member function 'void C2_MacroAssembler::reduce_operation_256(BasicType, int, XMMRegister, XMMRegister, XMMRegister)': ./src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp:1573:64: error: no matching function for call to 'C2_MacroAssembler::vpmulld(XMMRegisterImpl*&, XMMRegisterImpl*&, XMMRegisterImpl*&, int&)' ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Tue May 18 00:18:08 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 18 May 2021 00:18:08 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v11] In-Reply-To: References: Message-ID: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). > > For this benchmark, the optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: remove scratch register from vpmulld ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3806/files - new: https://git.openjdk.java.net/jdk/pull/3806/files/d6a58166..0583b2cb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3806&range=09-10 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3806.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806 PR: https://git.openjdk.java.net/jdk/pull/3806 From pli at openjdk.java.net Tue May 18 01:45:51 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Tue, 18 May 2021 01:45:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v7] In-Reply-To: References: <6U5EDcQli0NUlrhdOBtzkujogfwN_39V3bPs_7oIUX0=.e51f26d8-b503-4680-8d6e-34311a1b300f@github.com> Message-ID: <-cRDqPdnR_r_5hu_J_S3oiBhuu3xFsY2dU1caSmKZ-c=.6320af8a-3c2f-436b-ab68-f5b228d26904@github.com> On Mon, 17 May 2021 16:53:16 GMT, Xubo Zhang wrote: >> Ping @pfustc about permission to add his JMH micro or write your own based on examples in `test/micro/org/openjdk/bench/java/util/` > > @vnkozlov I implemented your review comments. Could you please take a look. I?ve copied @xbzhang99 's modified test case into http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java The original one w/o copyright header is backed up at http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java.old Please let me know if I should do anything else. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From yyang at openjdk.java.net Tue May 18 02:27:17 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 18 May 2021 02:27:17 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v5] In-Reply-To: References: Message-ID: > It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. Yi Yang has updated the pull request incrementally with one additional commit since the last revision: remove -XX:TieredStopAtLevel=4 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3616/files - new: https://git.openjdk.java.net/jdk/pull/3616/files/3ba69f6a..687b899d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3616&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3616&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3616.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3616/head:pull/3616 PR: https://git.openjdk.java.net/jdk/pull/3616 From yyang at openjdk.java.net Tue May 18 02:27:21 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 18 May 2021 02:27:21 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v4] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 19:25:58 GMT, Vladimir Kozlov wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> caonicalize in interpreter/c1/c2 modes > > test/hotspot/jtreg/compiler/c1/CanonicalizeGetModifiers.java line 52: > >> 50: * @requires vm.compiler2.enabled >> 51: * @library /test/lib >> 52: * @run main/othervm -XX:TieredStopAtLevel=4 -XX:-TieredCompilation > > `-XX:TieredStopAtLevel=4` does not work when Tiered is off. You don't need this flag. Okay, I learned this from [Valhalla project](https://github.com/openjdk/valhalla/blob/c8d7c8260921ccb2b2e72dcc6cc04144ff7649fc/test/hotspot/jtreg/compiler/valhalla/inlinetypes/TestC1.java#L61). Maybe it's redundant if -XX:-TieredCompilation is disabled. ------------- PR: https://git.openjdk.java.net/jdk/pull/3616 From hshi at openjdk.java.net Tue May 18 02:28:09 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Tue, 18 May 2021 02:28:09 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v2] In-Reply-To: References: Message-ID: > Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. > > Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. > > 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not meet nodes whose _idx is not unique (PhaseIterGVN::optimize is not triggered in the middle of PhaseRenumberLive ). Assertion every node is in current node_arena() in Node::verify, passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. > > 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. > > 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. > > 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. Hui Shi has updated the pull request incrementally with one additional commit since the last revision: update test requires from "vm.flavor == "server"" to vm.compiler2.enabled ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4045/files - new: https://git.openjdk.java.net/jdk/pull/4045/files/2ed9763e..9635c9b9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4045&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4045&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4045/head:pull/4045 PR: https://git.openjdk.java.net/jdk/pull/4045 From hshi at openjdk.java.net Tue May 18 02:28:11 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Tue, 18 May 2021 02:28:11 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v2] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 19:37:28 GMT, Vladimir Kozlov wrote: >> Hui Shi has updated the pull request incrementally with one additional commit since the last revision: >> >> update test requires from "vm.flavor == "server"" to vm.compiler2.enabled > > src/hotspot/share/opto/node.cpp line 2249: > >> 2247: } >> 2248: } >> 2249: if (cnt == 2) { > > I think it should be `cnt > 1`. previous loop breaks when meet first input which is same with x, so cnt must be 2 if x is duplicated with previous input. > test/hotspot/jtreg/compiler/debug/TraceIterativeGVN.java line 27: > >> 25: /* >> 26: * @test >> 27: * @requires vm.debug == true & vm.flavor == "server" > > Use `vm.compiler2.enabled` instead of `vm.flavor == "server"` Got! updated in new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From yyang at openjdk.java.net Tue May 18 03:02:40 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 18 May 2021 03:02:40 GMT Subject: RFR: 8265711: C1: Intrinsify Class.getModifier method [v5] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 02:27:17 GMT, Yi Yang wrote: >> It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > remove -XX:TieredStopAtLevel=4 Thanks Tobias and Vladimir for reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/3616 From david.holmes at oracle.com Tue May 18 03:08:06 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 May 2021 13:08:06 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: References: Message-ID: Hi Vladimir, Thanks for the review. On 18/05/2021 8:27 am, Vladimir Kozlov wrote: > On Fri, 14 May 2021 16:29:41 GMT, Vladimir Ivanov wrote: > >>> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >>> >>> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >>> >>> Testing: tiers 1-3 >>> >>> Thanks, >>> David >> >> Overall, it looks very good. >> Thanks for taking care of compiler part, David. >> >> I think it makes sense to remove lir_div_strictfp and lir_mul_strictfp in C1 as well: >> https://github.com/openjdk/jdk/pull/4027 >> >> Feel free to incorporate the patch into the current PR if you agree with the change. >> (Passed hs-tier1 - hs-tier4 testing and x86_32 build.) >> >> Otherwise, I'll handle it as a separate PR. > > @iwanowww I agree with your suggestion but lets do it in separate RFE. I can pull it into this change as there is no rush to integrate this given I need to wait for JEP 306 to be targeted. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/3991 > From yyang at openjdk.java.net Tue May 18 03:24:58 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 18 May 2021 03:24:58 GMT Subject: RFR: 8267151: C2: Don't create dummy Opaque1Node for outmost unswitched IfNode Message-ID: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> In create_slow_version_of_loop(), C2 creates the outmost unswitched IfNode(i.e. **if(xx)**{ for{} }else{ for{} }) with a dummy opaque bool node as its condition input. https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L265-L271 After that, it sets the _prob(missing _fcnt?) of the outmost unswitched IfNode in do_unswitching(). https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L186-L191 I think we can merge these two steps into a single step, that is, create the outmost unswitched IfNode meanwhile setting its condition input, _prob and _fcnt w/ creating the dummy opaque bool node. Testing: - hotspot/jtreg/compiler(slowdebug) ------------- Commit messages: - loop unswitching opt Changes: https://git.openjdk.java.net/jdk/pull/4079/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4079&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267151 Stats: 28 lines in 2 files changed: 3 ins; 14 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/4079.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4079/head:pull/4079 PR: https://git.openjdk.java.net/jdk/pull/4079 From dholmes at openjdk.java.net Tue May 18 04:26:00 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 18 May 2021 04:26:00 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David David Holmes has updated the pull request incrementally with one additional commit since the last revision: lir_div_strictfp and lir_mul_strictfp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3991/files - new: https://git.openjdk.java.net/jdk/pull/3991/files/683da141..c0c35a77 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=00-01 Stats: 116 lines in 20 files changed: 9 ins; 74 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/3991.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3991/head:pull/3991 PR: https://git.openjdk.java.net/jdk/pull/3991 From dholmes at openjdk.java.net Tue May 18 04:36:41 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 18 May 2021 04:36:41 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: On Fri, 14 May 2021 16:29:41 GMT, Vladimir Ivanov wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> lir_div_strictfp and lir_mul_strictfp > > Overall, it looks very good. > Thanks for taking care of compiler part, David. > > I think it makes sense to remove lir_div_strictfp and lir_mul_strictfp in C1 as well: > https://github.com/openjdk/jdk/pull/4027 > > Feel free to incorporate the patch into the current PR if you agree with the change. > (Passed hs-tier1 - hs-tier4 testing and x86_32 build.) > > Otherwise, I'll handle it as a separate PR. @iwanowww and @vnkozlov I have merged @iwanowww 's changes with this PR and am re-testing. Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From jbhateja at openjdk.java.net Tue May 18 05:21:06 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 18 May 2021 05:21:06 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > > > > > > > > > > > > > > BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain > -- | -- | -- | -- | -- | -- | -- > ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 > ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 > ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 > ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 > ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 > ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 > ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 > ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 > ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 > ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 > ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 > ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 > ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 > ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 > ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 > ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 > ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 > ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 > ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 > ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 > ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 > ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 > ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 > ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 > ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 > ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 > ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 > ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 > ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 > ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 > ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 > ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 > ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 > ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 > ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 > ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 > ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 > ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 > ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 > ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 > ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 > ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 > ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 > ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 > ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 > ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 > ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 > ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 > ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 > ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 > ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 > ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 > ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 > ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 > ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 > ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 > ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 > ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 > ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 > ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 > ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 > ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 > ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 > ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 > ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 > ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 > ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 > ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 > ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 > ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3999/files - new: https://git.openjdk.java.net/jdk/pull/3999/files/1070ab55..946e997a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Tue May 18 05:21:07 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 18 May 2021 05:21:07 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v3] In-Reply-To: <2Hq64WSZ6ulPHDZi9RC1sfOZPG6BB94G7NQDReQOPjM=.b1e01864-1d91-4516-b430-947916c139f5@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <2Hq64WSZ6ulPHDZi9RC1sfOZPG6BB94G7NQDReQOPjM=.b1e01864-1d91-4516-b430-947916c139f5@github.com> Message-ID: <3xXsVzOyKbZcAQ8jA2KKHf6mZUGZMCVIti2RkegVig8=.80d9c8dd-0d85-4c8c-841c-6754ded3a658@github.com> On Mon, 17 May 2021 15:29:54 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266951: Enable partial in-lining if UsePartialInlineSize=64, adding a benchmark for small sized conversions of various primitive types. > > test/micro/org/openjdk/bench/java/util/ArraysMismatch.java line 65: > >> 63: leftStartRange = size / 4; >> 64: leftEndRange = size - size / 4; >> 65: rightStartRange = size / 4 + 1; > > Since you changed `10` to `1` perhaps make this a parameter defaulting to the new value? I have reverted the changes in existing benchmark since a new benchmark has been added for partial in-lining cases. PR has been updated with its performance results. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From ksakata at openjdk.java.net Tue May 18 05:26:56 2021 From: ksakata at openjdk.java.net (Koichi Sakata) Date: Tue, 18 May 2021 05:26:56 GMT Subject: RFR: 8263385: IGV: Graph is not opened in the window that has focus. Message-ID: This pull request enables IGV opens a graph in the window that is focused. At the moment IGV opens a graph in the window that has the graph and is found first. So in this pull request I used preferentially the active EditorTopComponent. I tested the following scenarios manually: 1. Open a graph, open clone, then open another graph (as described in the bug report). It replaces the clone graph with the last opened graph. 2. Open a graph, open clone, swap tabs by dragging the clone graph, then open another graph. It replaces the clone graph with the last opened graph. 3. Open a graph, open clone, change the focus from the clone graph to the first graph, then open another graph. It replaces the first graph with the last opened graph. 4. Open a graph, open clone, open the same graph xml file from the toolbar, open a graph in the second folder, then open a graph in the first folder. It replaces the leftmost graph that was opened the first with the last opened graph. ------------- Commit messages: - Load a graph to the active top comonent if the active component has the graph Changes: https://git.openjdk.java.net/jdk/pull/4078/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4078&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8263385 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4078.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4078/head:pull/4078 PR: https://git.openjdk.java.net/jdk/pull/4078 From cgo at openjdk.java.net Tue May 18 06:53:46 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Tue, 18 May 2021 06:53:46 GMT Subject: RFR: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux Thanks for the reviews, tests are green again. ------------- PR: https://git.openjdk.java.net/jdk/pull/4053 From rrich at openjdk.java.net Tue May 18 07:19:39 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 18 May 2021 07:19:39 GMT Subject: RFR: 8263385: IGV: Graph is not opened in the window that has focus. In-Reply-To: References: Message-ID: On Tue, 18 May 2021 03:01:58 GMT, Koichi Sakata wrote: > This pull request enables IGV opens a graph in the window that is focused. > > At the moment IGV opens a graph in the window that has the graph and is found first. So in this pull request I used preferentially the active EditorTopComponent. > > I tested the following scenarios manually: > > 1. Open a graph, open clone, then open another graph (as described in the bug report). It replaces the clone graph with the last opened graph. > 2. Open a graph, open clone, swap tabs by dragging the clone graph, then open another graph. It replaces the clone graph with the last opened graph. > 3. Open a graph, open clone, change the focus from the clone graph to the first graph, then open another graph. It replaces the first graph with the last opened graph. > 4. Open a graph, open clone, open the same graph xml file from the toolbar, open a graph in the second folder, then open a graph in the first folder. It replaces the leftmost graph that was opened the first with the last opened graph. Hello Koichi, thanks for taking care of this issue. I've built and tested this pull request and found that it works in most cases. Here's what did not work: 1. Open Graph -> new Tab T1 is created 2. Open Clone -> new Tab T2 is created 3. Use the mouse to drag T2 down in the lower part of the window until the red frame indicates that the window will be split horizontally -> The window will be split horizontally. T2 is the lower window and has focus. 4. Open another Graph in the outline 5. IGV shows that graph in T1 even though T2 had focus. This is unexpected. Despite that I think your change is good. Unfortunately I can only test but not review the change itself as I am not familiar with IGV source code. Thanks, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/4078 From roland at openjdk.java.net Tue May 18 07:28:39 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 18 May 2021 07:28:39 GMT Subject: RFR: 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use" In-Reply-To: References: Message-ID: On Mon, 17 May 2021 11:44:48 GMT, Hui Shi wrote: > ? crash with "no reachable node should have no use" > > Please help reivew this fix. > > StrIntrinsicNode::Ideal uses Node::set_req to replace memory input, old memory input might have 0 use, but not added into PhaseGVN worklist. Using set_req_X to ensure add 0 out old memory input node into PhaseGVN worklist. > > Find other two similar problemtic code in LoadNode::Ideal. > > Tier1/2/3 pass with release/fastdebug build. > test/jdk/java/util/Collections/FindSubList.java doesn't fail in 100 runs (before fix 2/3 failure in 10 runs). Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4055 From thartmann at openjdk.java.net Tue May 18 07:28:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 07:28:40 GMT Subject: RFR: 8266615: C2 incorrectly folds subtype checks involving an interface array In-Reply-To: References: Message-ID: <7f2rV8E8dV4XVCutpH0aS3OR23pke4TG2Vsjd12u6D4=.4bdaf9e1-8862-4704-8e05-72384e94ab14@github.com> On Mon, 17 May 2021 12:55:06 GMT, Tobias Hartmann wrote: > C2 incorrectly folds the subtype checks in `TestInterfaceArraySubtypeCheck::test1/test2`. As a result, an unexpected `ClassCastException` is thrown at `checkcast` and `instanceof` returns a wrong result. The problem is in `Compile::static_subtype_check` where we incorrectly return `SSC_always_false` for the `MyInterface[] <: MyClassA[]` check because `MyClassA[]` is not a subtype of `MyInterface[]` (after checking that `MyInterface[]` is not a subtype of `MyClassA[]`). > > The fix is to check that `subelem` is not an interface. This is very old code and not a recent regression. > > Thanks, > Tobias Thanks for the review, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/4060 From yyang at openjdk.java.net Tue May 18 07:32:48 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 18 May 2021 07:32:48 GMT Subject: Integrated: 8265711: C1: Intrinsify Class.getModifier method In-Reply-To: References: Message-ID: On Thu, 22 Apr 2021 07:02:29 GMT, Yi Yang wrote: > It's relatively a common case to get modifiers from a constant Class instance, i.e. ThirdPartyClass.class.getModifiers(). Currently, C1 Canonicalizer missed the opportunity of replacing Class.getModifiers intrinsic calls with compile-time constants. This pull request has now been integrated. Changeset: 905b41ac Author: Yi Yang Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/905b41ac6ae44e5adb51cff37995cff534db47f0 Stats: 174 lines in 5 files changed: 174 ins; 0 del; 0 mod 8265711: C1: Intrinsify Class.getModifier method Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3616 From cgo at openjdk.java.net Tue May 18 07:33:42 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Tue, 18 May 2021 07:33:42 GMT Subject: Integrated: 8267237: ARM32: bad AD file in matcher.cpp after 8266810 In-Reply-To: References: Message-ID: <7eSPecS6ETimd7e34iwbDPnMqS0LTqqftVEXHvBUIso=.84e18380-d06a-44c2-82eb-c9336f89f0f1@github.com> On Mon, 17 May 2021 10:48:17 GMT, Christoph G?ttschkes wrote: > It appears that [JDK-8266810](https://bugs.openjdk.java.net/browse/JDK-8266810) introduced regression into aarch32. Many JTreg tests are failing with: > > # Internal Error (/var/jnode/openjdk-build-ws/workspace/openjdk-build/jdk/jdk-arm-linux-gnueabihf/jdk/src/hotspot/share/opto/matcher.cpp:1670), pid=15030, tid=15047 > # assert(false) failed: bad AD file > > > Testing: hotspot tier1 on ARMv7-A / linux This pull request has now been integrated. Changeset: b60975dd Author: Christoph G?ttschkes Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/b60975dd85d62d38e3c13c87db611c6fd08dc698 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8267237: ARM32: bad AD file in matcher.cpp after 8266810 Reviewed-by: redestad, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4053 From thartmann at openjdk.java.net Tue May 18 08:07:43 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 08:07:43 GMT Subject: RFR: 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use" In-Reply-To: References: Message-ID: On Mon, 17 May 2021 11:44:48 GMT, Hui Shi wrote: > ? crash with "no reachable node should have no use" > > Please help reivew this fix. > > StrIntrinsicNode::Ideal uses Node::set_req to replace memory input, old memory input might have 0 use, but not added into PhaseGVN worklist. Using set_req_X to ensure add 0 out old memory input node into PhaseGVN worklist. > > Find other two similar problemtic code in LoadNode::Ideal. > > Tier1/2/3 pass with release/fastdebug build. > test/jdk/java/util/Collections/FindSubList.java doesn't fail in 100 runs (before fix 2/3 failure in 10 runs). Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4055 From yyang at openjdk.java.net Tue May 18 08:28:58 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 18 May 2021 08:28:58 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator Message-ID: % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: - x % -y ==> [0, y - 1] RCE - x % y ==> [0, y - 1] RCE - -x % y ==> [-y + 1, 0] - -x % -y ==> [-y + 1, 0] Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: for(int i=0;i<1000;i++){ int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant .... } For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. Testing: - test/hotspot/jtreg/compiler/c1/(slowdebug) ------------- Commit messages: - rce_opt Changes: https://git.openjdk.java.net/jdk/pull/4083/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4083&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267239 Stats: 105 lines in 3 files changed: 94 ins; 10 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4083.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4083/head:pull/4083 PR: https://git.openjdk.java.net/jdk/pull/4083 From yyang at openjdk.java.net Tue May 18 08:40:15 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 18 May 2021 08:40:15 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator [v2] In-Reply-To: References: Message-ID: > % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). > > So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: > - x % -y ==> [0, y - 1] RCE > - x % y ==> [0, y - 1] RCE > - -x % y ==> [-y + 1, 0] > - -x % -y ==> [-y + 1, 0] > > Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: > > > for(int i=0;i<1000;i++){ > int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant > .... > } > > > For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. > > Testing: > - test/hotspot/jtreg/compiler/c1/(slowdebug) Yi Yang has updated the pull request incrementally with one additional commit since the last revision: more comment for test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4083/files - new: https://git.openjdk.java.net/jdk/pull/4083/files/44ea27a7..64bdf0f2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4083&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4083&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4083.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4083/head:pull/4083 PR: https://git.openjdk.java.net/jdk/pull/4083 From aph at openjdk.java.net Tue May 18 09:05:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 18 May 2021 09:05:41 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: <5LrtCco5YrCLPyuFDw_4EHTARs2AUt4g3eDs81ncnVM=.22eae424-1983-4400-9352-8688a5fb5ca0@github.com> On Mon, 17 May 2021 02:38:59 GMT, Nick Gasson wrote: > > There's an urge from some contributors: when I suggest doing something in an easy-to-understand and clean way, people want to change everything else to match. This urge can be resisted, and IMVHO should be in this case. Churn is, in itself, bad. > > Consistency is at least somewhat important though, right? Certainly, yes. But please resist the temptation to do things in a less-good way to be consistent with existing code. Code that has been running solidly for a decade is a benefit too. It's a judgement call. Hard-and-fast rules are a Bad Thing, because they get in the way of carefully-considered judgement calls. > This code is read much more often than it's modified, and in the case of the platform ports, often by people with less experience of OpenJDK as a whole. It seems worth spending a little time to do cleanups when modifying adjacent code, if it makes it easier to understand. True, but bear in mind that people also read diffs; and sometimes you have to spend time trying to figure out if a diff was actually meant to change something. > `dont_gc_arguments` is equally or more confusing because its value is false and then it gets passed to an argument `must_gc_arguments` whose sense is inverted. I don't see what's wrong with: > > ```c++ > StubFrame f(sasm, "blah", /* must_gc_arguments */ false, /* does_not_return */ true); > ``` The fact that you need comments to mark the arguments is a code smell too, IMO. Sure, if you're going to pass a bunch of booleans the comments are essential. But we are where we are. ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From aph at openjdk.java.net Tue May 18 09:10:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 18 May 2021 09:10:48 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: <5LrtCco5YrCLPyuFDw_4EHTARs2AUt4g3eDs81ncnVM=.22eae424-1983-4400-9352-8688a5fb5ca0@github.com> References: <5LrtCco5YrCLPyuFDw_4EHTARs2AUt4g3eDs81ncnVM=.22eae424-1983-4400-9352-8688a5fb5ca0@github.com> Message-ID: On Tue, 18 May 2021 09:02:59 GMT, Andrew Haley wrote: >>> >>> There's an urge from some contributors: when I suggest doing something in an easy-to-understand and clean way, people want to change everything else to match. This urge can be resisted, and IMVHO should be in this case. Churn is, in itself, bad. >> >> Consistency is at least somewhat important though, right? This code is read much more often than it's modified, and in the case of the platform ports, often by people with less experience of OpenJDK as a whole. It seems worth spending a little time to do cleanups when modifying adjacent code, if it makes it easier to understand. >> >>> >>> And this case, is special, I think, because `does_not_return` uses the "don't do" anti-pattern, where the `true` case was `does_not_return`. >> >> `dont_gc_arguments` is equally or more confusing because its value is false and then it gets passed to an argument `must_gc_arguments` whose sense is inverted. I don't see what's wrong with: >> >> ```c++ >> StubFrame f(sasm, "blah", /* must_gc_arguments */ false, /* does_not_return */ true); > >> > There's an urge from some contributors: when I suggest doing something in an easy-to-understand and clean way, people want to change everything else to match. This urge can be resisted, and IMVHO should be in this case. Churn is, in itself, bad. >> >> Consistency is at least somewhat important though, right? > > Certainly, yes. But please resist the temptation to do things in a less-good way to be consistent with existing code. Code that has been running solidly for a decade is a benefit too. It's a judgement call. Hard-and-fast rules are a Bad Thing, because they get in the way of carefully-considered judgement calls. > >> This code is read much more often than it's modified, and in the case of the platform ports, often by people with less experience of OpenJDK as a whole. It seems worth spending a little time to do cleanups when modifying adjacent code, if it makes it easier to understand. > > True, but bear in mind that people also read diffs; and sometimes you have to spend time trying to figure out if a diff was actually meant to change something. > >> `dont_gc_arguments` is equally or more confusing because its value is false and then it gets passed to an argument `must_gc_arguments` whose sense is inverted. I don't see what's wrong with: >> >> ```c++ >> StubFrame f(sasm, "blah", /* must_gc_arguments */ false, /* does_not_return */ true); >> ``` > > The fact that you need comments to mark the arguments is a code smell too, IMO. Sure, if you're going to pass a bunch of booleans the comments are essential. But we are where we are. Bear in mind also that this patch adds some complexity but provides (almost) no benefit to users. It's pretty marginal as a change, and therefore it's hard to justify much churn. ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From neliasso at openjdk.java.net Tue May 18 09:42:55 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 09:42:55 GMT Subject: RFR: 8266615: C2 incorrectly folds subtype checks involving an interface array In-Reply-To: References: Message-ID: On Mon, 17 May 2021 12:55:06 GMT, Tobias Hartmann wrote: > C2 incorrectly folds the subtype checks in `TestInterfaceArraySubtypeCheck::test1/test2`. As a result, an unexpected `ClassCastException` is thrown at `checkcast` and `instanceof` returns a wrong result. The problem is in `Compile::static_subtype_check` where we incorrectly return `SSC_always_false` for the `MyInterface[] <: MyClassA[]` check because `MyClassA[]` is not a subtype of `MyInterface[]` (after checking that `MyInterface[]` is not a subtype of `MyClassA[]`). > > The fix is to check that `subelem` is not an interface. This is very old code and not a recent regression. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4060 From ksakata at openjdk.java.net Tue May 18 09:51:53 2021 From: ksakata at openjdk.java.net (Koichi Sakata) Date: Tue, 18 May 2021 09:51:53 GMT Subject: RFR: 8260360: IGV: Short name of combined nodes is hidden by background color Message-ID: This pull request enables the short name of combined nodes readable. At present those node are painted out with black because their OutputSlot color is null. So this pull request sets the original color to the OutputSlot. I tested the following scenario manually: - Open a graph, then enable "Simplify graph" (as described in the bug report). The result is the following images. There are two graphs. One is black and white only graph, the other is colored graph. ????????? 2021-05-18 16 25 03 ????????? 2021-05-18 16 25 33 ------------- Commit messages: - Set the original color to OutputSlot Changes: https://git.openjdk.java.net/jdk/pull/4082/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4082&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260360 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4082.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4082/head:pull/4082 PR: https://git.openjdk.java.net/jdk/pull/4082 From neliasso at openjdk.java.net Tue May 18 09:55:39 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 09:55:39 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: References: Message-ID: On Fri, 14 May 2021 06:04:45 GMT, Xiaohong Gong wrote: > When creating the vector shuffle, the `"VectorLoadConstNode"` will be created to get an initial index vector. Before creating it, the compiler should check whether the current platform supports this opcode in case the jvm crashes with `"bad ad file"`. The compiler should finish the intrinsification and go back to the default java implementation if the backend doesn't support it. > > Tested tier1 and jdk::tier3. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4023 From ksakata at openjdk.java.net Tue May 18 09:57:41 2021 From: ksakata at openjdk.java.net (Koichi Sakata) Date: Tue, 18 May 2021 09:57:41 GMT Subject: RFR: 8263385: IGV: Graph is not opened in the window that has focus. In-Reply-To: References: Message-ID: On Tue, 18 May 2021 03:01:58 GMT, Koichi Sakata wrote: > This pull request enables IGV opens a graph in the window that is focused. > > At the moment IGV opens a graph in the window that has the graph and is found first. So in this pull request I used preferentially the active EditorTopComponent. > > I tested the following scenarios manually: > > 1. Open a graph, open clone, then open another graph (as described in the bug report). It replaces the clone graph with the last opened graph. > 2. Open a graph, open clone, swap tabs by dragging the clone graph, then open another graph. It replaces the clone graph with the last opened graph. > 3. Open a graph, open clone, change the focus from the clone graph to the first graph, then open another graph. It replaces the first graph with the last opened graph. > 4. Open a graph, open clone, open the same graph xml file from the toolbar, open a graph in the second folder, then open a graph in the first folder. It replaces the leftmost graph that was opened the first with the last opened graph. Thank you for confirming, Richard. I appreciate it. I'll start to fix that behavior. ------------- PR: https://git.openjdk.java.net/jdk/pull/4078 From neliasso at openjdk.java.net Tue May 18 10:04:43 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 10:04:43 GMT Subject: RFR: 8267130: Memory Overflow in Disassembler::load_library In-Reply-To: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> References: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> Message-ID: <45LJtc3shxMI0RmSQVUFv-o7WxW25qbU-TzHlGtnxhs=.928a2021-41c9-4a1c-b17f-476e7448edbe@github.com> On Fri, 14 May 2021 02:17:29 GMT, Wang Huang wrote: > * reproduce: > put your libjvm.so in a long enough path, such like Just noting (not requiring you to fix in this CR). The error printing should be converted to use unified logging. Approved. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4020 From thartmann at openjdk.java.net Tue May 18 10:11:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 10:11:39 GMT Subject: RFR: 8266615: C2 incorrectly folds subtype checks involving an interface array In-Reply-To: References: Message-ID: On Mon, 17 May 2021 12:55:06 GMT, Tobias Hartmann wrote: > C2 incorrectly folds the subtype checks in `TestInterfaceArraySubtypeCheck::test1/test2`. As a result, an unexpected `ClassCastException` is thrown at `checkcast` and `instanceof` returns a wrong result. The problem is in `Compile::static_subtype_check` where we incorrectly return `SSC_always_false` for the `MyInterface[] <: MyClassA[]` check because `MyClassA[]` is not a subtype of `MyInterface[]` (after checking that `MyInterface[]` is not a subtype of `MyClassA[]`). > > The fix is to check that `subelem` is not an interface. This is very old code and not a recent regression. > > Thanks, > Tobias Thanks Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/4060 From hshi at openjdk.java.net Tue May 18 10:33:38 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Tue, 18 May 2021 10:33:38 GMT Subject: RFR: 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use" In-Reply-To: References: Message-ID: On Tue, 18 May 2021 08:04:35 GMT, Tobias Hartmann wrote: >> ? crash with "no reachable node should have no use" >> >> Please help reivew this fix. >> >> StrIntrinsicNode::Ideal uses Node::set_req to replace memory input, old memory input might have 0 use, but not added into PhaseGVN worklist. Using set_req_X to ensure add 0 out old memory input node into PhaseGVN worklist. >> >> Find other two similar problemtic code in LoadNode::Ideal. >> >> Tier1/2/3 pass with release/fastdebug build. >> test/jdk/java/util/Collections/FindSubList.java doesn't fail in 100 runs (before fix 2/3 failure in 10 runs). > > Looks good to me too. Thanks @TobiHartmann @rwestrel ! ------------- PR: https://git.openjdk.java.net/jdk/pull/4055 From vlivanov at openjdk.java.net Tue May 18 11:08:52 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 18 May 2021 11:08:52 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v4] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: <7M7I3lIRyEO6YW-K_glajPiCRKaEbjktMZc0_UFZsME=.521074d6-a6a1-4b93-8d00-f3572aec6836@github.com> On Mon, 17 May 2021 08:39:22 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Review comments resolution. Leaving the check on mask length aside (`num_elem <= 2` in `LibraryCallKit::inline_vector_mask_operation`), the patch looks good. A couple minor suggestions follow. src/hotspot/cpu/x86/x86.ad line 8073: > 8071: const MachNode* mask_node = static_cast(this->in(this->operand_index($mask))); > 8072: assert(mask_node->bottom_type()->isa_vect(), ""); > 8073: int vector_len = vector_length_encoding(mask_node); I think you can just use `int vlen_enc = vector_length_encoding(this, $mask);` here. src/hotspot/cpu/x86/x86.ad line 8077: > 8075: int mask_len = mask_node->bottom_type()->is_vect()->length(); > 8076: __ vector_mask_operation(opcode, $dst$$Register, $mask$$XMMRegister, $xtmp$$XMMRegister, > 8077: $tmp$$Register, $ktmp$$KRegister, mask_len, vector_len); On naming: `vector_len` and `mask_len` are misleadingly similar. While the latter represents the number of elements, the former is x86-specific encoding of vector length. It makes sense to stress the difference w/ a different name. That's why I propose `vlen_enc`. Unfortunately, it's not uniformly used across `x86.ad` yet, but at least some code already migrated. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3916 From vlivanov at openjdk.java.net Tue May 18 11:08:52 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 18 May 2021 11:08:52 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v4] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> <6sgoak9hLqP2gY_tFdDNmwpd1WmCdbQxgu_s6PEhHSo=.0364c120-34fd-4792-878d-d09e9673b92f@github.com> Message-ID: <1vlMmlyDAD1NwBiqyfcRhn1aq9V-Ymqbr78cy21-rUk=.f418d950-7303-4de7-85fb-8376d6f6eaa4@github.com> On Mon, 17 May 2021 14:13:16 GMT, Jatin Bhateja wrote: > This is being enforced by Matcher::match_rule_supported_vector(), for a 512 bit vector of sub-word type is supported only if target supports AVX512BW. For other types apart from sub-word types a 512 bit vector mask will be handled by the second instruction selection pattern which is predicated by !VM_Version::supports_avx512vlbw() since for them maximum vector size needed to hold the byte vector containing mask will always be <= 32 bytes. Ah, now I get it! Thanks for the clarifications. It's the consequence of canonical mask representation being consumed by the operations. Worth putting a comment stressing that aspect. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From vlivanov at openjdk.java.net Tue May 18 11:26:42 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 18 May 2021 11:26:42 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: References: Message-ID: <_c7Ik2rZymkq9p0DqcHNeEWqbjs1ToH6WVmq_jR0f7U=.63a3cd81-b760-44d1-85b7-d654b0fd6240@github.com> On Mon, 17 May 2021 03:18:34 GMT, Xiaohong Gong wrote: > As far as I know the VectorLoadConst is used here to get the initial shuffle iota of the vector. I'm not so clear about what the iota vector constant materialization you mean. `VectorLoadConst` is backed by a constant in `StubRoutines`. Instead, the constant can be materialized as a on-heap ByteVector instance, cached in a static final field, and passed into the intrinsic. An alternative approach would be to replace `VectorLoadConst` with a `LoadVector` which performs raw vector access at `StubRoutines::_vector_iota_indices` address. All in all, I don't see `VectorLoadConst` well-justified. src/hotspot/cpu/aarch64/aarch64_neon.ad: 3369 instruct loadcon8B(vecD dst, immI0 src) 3374 match(Set dst (VectorLoadConst src)); 3378 __ lea(rscratch1, ExternalAddress(StubRoutines::aarch64::vector_iota_indices())); src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp: 7055 StubRoutines::aarch64::_vector_iota_indices = generate_iota_indices("iota_indices"); 618 // Generate indices for iota vector. 619 address generate_iota_indices(const char *stub_name) { 620 __ align(CodeEntryAlignment); 621 StubCodeMark mark(this, "StubRoutines", stub_name); 622 address start = __ pc(); 623 __ emit_data64(0x0706050403020100, relocInfo::none); 624 __ emit_data64(0x0F0E0D0C0B0A0908, relocInfo::none); 625 return start; 626 } ------------- PR: https://git.openjdk.java.net/jdk/pull/4023 From neliasso at openjdk.java.net Tue May 18 12:19:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 12:19:40 GMT Subject: RFR: 8260360: IGV: Short name of combined nodes is hidden by background color In-Reply-To: References: Message-ID: <8NyfIku45EF3i0SGYY7EMp8dnvxy8TS1mRbqp-eFIFU=.764573c4-1e55-437d-b087-f0137bf10db1@github.com> On Tue, 18 May 2021 07:33:01 GMT, Koichi Sakata wrote: > This pull request enables the short name of combined nodes readable. > > At present those node are painted out with black because their OutputSlot color is null. So this pull request sets the original color to the OutputSlot. > > I tested the following scenario manually: > > - Open a graph, then enable "Simplify graph" (as described in the bug report). The result is the following images. There are two graphs. One is black and white only graph, the other is colored graph. > > ????????? 2021-05-18 16 25 03 > ????????? 2021-05-18 16 25 33 Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4082 From ddong at openjdk.java.net Tue May 18 12:23:05 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 18 May 2021 12:23:05 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 07:20:18 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix crash problem Hi Tobias, Thanks for your quick response! Is there any other problem caused by the current implementation? Best, Denghui ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From thartmann at openjdk.java.net Tue May 18 12:24:56 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 12:24:56 GMT Subject: Integrated: 8266615: C2 incorrectly folds subtype checks involving an interface array In-Reply-To: References: Message-ID: On Mon, 17 May 2021 12:55:06 GMT, Tobias Hartmann wrote: > C2 incorrectly folds the subtype checks in `TestInterfaceArraySubtypeCheck::test1/test2`. As a result, an unexpected `ClassCastException` is thrown at `checkcast` and `instanceof` returns a wrong result. The problem is in `Compile::static_subtype_check` where we incorrectly return `SSC_always_false` for the `MyInterface[] <: MyClassA[]` check because `MyClassA[]` is not a subtype of `MyInterface[]` (after checking that `MyInterface[]` is not a subtype of `MyClassA[]`). > > The fix is to check that `subelem` is not an interface. This is very old code and not a recent regression. > > Thanks, > Tobias This pull request has now been integrated. Changeset: ce88b334 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/ce88b334884b6cc76bd938a8a8e6a9b28a777cb8 Stats: 87 lines in 2 files changed: 85 ins; 0 del; 2 mod 8266615: C2 incorrectly folds subtype checks involving an interface array Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/4060 From vlivanov at openjdk.java.net Tue May 18 12:50:04 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 18 May 2021 12:50:04 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses [v3] In-Reply-To: References: Message-ID: <2dW5h_12hO4zO3lnnlf-4w54KH0ImwRgYxD2U4q1uQA=.ed0cef27-9e74-4a7c-85e9-75408382a406@github.com> > Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. > > Found 3 occurrences: > - `Dependencies::find_finalizable_subclass()` > - `reinitialize_vtable_of()` > - `VM_RedefineClasses::increment_class_counter()` > > Testing: > - [x] hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into 8266973.iterator - JFR - 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses ------------- Changes: https://git.openjdk.java.net/jdk/pull/3995/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3995&range=02 Stats: 101 lines in 9 files changed: 8 ins; 54 del; 39 mod Patch: https://git.openjdk.java.net/jdk/pull/3995.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3995/head:pull/3995 PR: https://git.openjdk.java.net/jdk/pull/3995 From vlivanov at openjdk.java.net Tue May 18 12:50:05 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 18 May 2021 12:50:05 GMT Subject: RFR: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses [v2] In-Reply-To: References: Message-ID: On Thu, 13 May 2021 09:36:40 GMT, Vladimir Ivanov wrote: >> Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. >> >> Found 3 occurrences: >> - `Dependencies::find_finalizable_subclass()` >> - `reinitialize_vtable_of()` >> - `VM_RedefineClasses::increment_class_counter()` >> >> Testing: >> - [x] hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > JFR Thanks for the reviews, Vladimir and Coleen. ------------- PR: https://git.openjdk.java.net/jdk/pull/3995 From vlivanov at openjdk.java.net Tue May 18 12:50:06 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 18 May 2021 12:50:06 GMT Subject: Integrated: 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses In-Reply-To: References: Message-ID: On Wed, 12 May 2021 13:30:09 GMT, Vladimir Ivanov wrote: > Replace ad-hoc recursion when enumerating subclasses with `ClassHierarchyIterator`. > > Found 3 occurrences: > - `Dependencies::find_finalizable_subclass()` > - `reinitialize_vtable_of()` > - `VM_RedefineClasses::increment_class_counter()` > > Testing: > - [x] hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 9d168e25 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d Stats: 101 lines in 9 files changed: 8 ins; 54 del; 39 mod 8266973: Migrate to ClassHierarchyIterator when enumerating subclasses Reviewed-by: kvn, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/3995 From neliasso at openjdk.java.net Tue May 18 13:24:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 13:24:40 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 08:40:15 GMT, Yi Yang wrote: >> % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). >> >> So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: >> - x % -y ==> [0, y - 1] RCE >> - x % y ==> [0, y - 1] RCE >> - -x % y ==> [-y + 1, 0] >> - -x % -y ==> [-y + 1, 0] >> >> Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: >> >> >> for(int i=0;i<1000;i++){ >> int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant >> .... >> } >> >> >> For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. >> >> Testing: >> - test/hotspot/jtreg/compiler/c1/(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > more comment for test Have you checked how C2 handles this case? ------------- PR: https://git.openjdk.java.net/jdk/pull/4083 From thartmann at openjdk.java.net Tue May 18 14:26:03 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 14:26:03 GMT Subject: RFR: 8266480: Implicit null check optimization does not update control of hoisted memory operation Message-ID: C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. After matching, the graph looks like this: `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). Thanks, Tobias ------------- Commit messages: - 8266480: Implicit null check optimization does not update control of hoisted memory operation Changes: https://git.openjdk.java.net/jdk/pull/4093/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4093&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266480 Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4093.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4093/head:pull/4093 PR: https://git.openjdk.java.net/jdk/pull/4093 From thartmann at openjdk.java.net Tue May 18 14:34:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 14:34:39 GMT Subject: RFR: 8265262: CITime - 'other' incorrectly calculated In-Reply-To: References: Message-ID: On Mon, 17 May 2021 16:36:26 GMT, Nils Eliasson wrote: > This CR fixes a few issues with the CITIme output for C2: > > 1) The other category for _t_optimize is not removing time spent in _t_vector > > 2) Some of the _t_incrInline sub counters is called from different contexts - calculating 'other' from total time spent in _t_incrInline expects that the counter usage is strictly hierarchical. > > 3) I've placed the non-hierarchical counters in braces. > > 4) Code Installation is a part of Code Emission (_t_output). Indentation fixed. > > 5) Moved "renumber live" after "Vector" so that they appear in order. > > 6) Added sub counters "shorten branches" and "fill buffer" to "Code Emission" phase, and added an other category. Before more than 50% of time in Code Emission was unaccounted for, now it's less than 25%. > > Please review, > Best regards, > Nils Eliasson Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4065 From github.com+4146708+a74nh at openjdk.java.net Tue May 18 14:36:05 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Tue, 18 May 2021 14:36:05 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly [v2] In-Reply-To: References: Message-ID: > For many of the stub frames, a leave/ret is generated after the stub has > already branched or returned. This is confusing. For these cases, replace > the superfluous code with a should_not_reach_here > > For handle excception, instead of storing return from the exception > handler on the stack, it can be moved directly into lr, replacing a store and > load with a single move. (If/when PAC support is implemented, then this store > would also have to be signed). Alan Hayward has updated the pull request incrementally with two additional commits since the last revision: - Restore save to stack code Change-Id: I20c9d04d372c2da3fceb6921a3696074561666d9 - Add return_state_t enum Change-Id: If4472d3230466ac4409e9b63dedfdb54013c6e3d ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4030/files - new: https://git.openjdk.java.net/jdk/pull/4030/files/6f060c59..9d6a057a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4030&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4030&range=00-01 Stats: 33 lines in 1 file changed: 14 ins; 10 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/4030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4030/head:pull/4030 PR: https://git.openjdk.java.net/jdk/pull/4030 From github.com+4146708+a74nh at openjdk.java.net Tue May 18 14:36:05 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Tue, 18 May 2021 14:36:05 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: On Fri, 14 May 2021 11:28:45 GMT, Alan Hayward wrote: > For many of the stub frames, a leave/ret is generated after the stub has > already branched or returned. This is confusing. For these cases, replace > the superfluous code with a should_not_reach_here > > For handle excception, instead of storing return from the exception > handler on the stack, it can be moved directly into lr, replacing a store and > load with a single move. (If/when PAC support is implemented, then this store > would also have to be signed). *Restored the code writing return value to stack *Added enum. ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From thartmann at openjdk.java.net Tue May 18 14:42:50 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 14:42:50 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 07:20:18 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix crash problem Hi Denghui, My testing did not find any more issues but this should be reviewed by someone who knows the JFR internals better than me. Just some quick comments: - Your bug/PR title is misleading. You are not adding an intrinsic for JVM.getClassId but you are updating the existing intrinsic to handle more cases - You are completely removing the C1 version of the intrinsic, I'm wondering if that is okay, assuming there was a good reason for adding it in the first place. - You wrote "Therefore, intensifying this method will decrease the overhead for this usage." Is that really measurable? Best regards, Tobias ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From thartmann at openjdk.java.net Tue May 18 15:11:44 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 15:11:44 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 08:40:15 GMT, Yi Yang wrote: >> % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). >> >> So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: >> - x % -y ==> [0, y - 1] RCE >> - x % y ==> [0, y - 1] RCE >> - -x % y ==> [-y + 1, 0] >> - -x % -y ==> [-y + 1, 0] >> >> Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: >> >> >> for(int i=0;i<1000;i++){ >> int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant >> .... >> } >> >> >> For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. >> >> Testing: >> - test/hotspot/jtreg/compiler/c1/(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > more comment for test Looks like C2 does not implement this optimization (should be in `ModINode::Value`). We should add it. src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 246: > 244: _bound = new Bound(); > 245: } > 246: }else { Missing whitespace `}else` ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4083 From ddong at openjdk.java.net Tue May 18 15:21:45 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 18 May 2021 15:21:45 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 14:39:42 GMT, Tobias Hartmann wrote: > Hi Denghui, > > My testing did not find any more issues but this should be reviewed by someone who knows the JFR internals better than me. Just some quick comments: > Thank you. Erik suggested that this patch should be reviewed by the compiler team, so I'm not sure who should review the patch... > * Your bug/PR title is misleading. You are not adding an intrinsic for JVM.getClassId but you are updating the existing intrinsic to handle more cases > * You are completely removing the C1 version of the intrinsic, I'm wondering if that is okay, assuming there was a good reason for adding it in the first place. In fact, the existing intrinsic implementation for JVM.getClassId is not used, I think the reason is that the underlying implementation of getClassId changed, but the corresponding intrinsic implementation has not been maintained. In other words, the intrinsic implementation for this method in C1 and C2 is wrong. At the same time, we can find a method named getClassIdNonIntrinsic in JVM. Java This title may not be appropriate. How about "Reimplementing the intrinsics of JVM.getClassId" I have already implemented the C1 part, but it contains some cross platform code(only support x86 and aarch64), I think it's better to implement it in C2 first. > * You wrote "Therefore, intensifying this method will decrease the overhead for this usage." Is that really measurable? > I only wrote a microbenchmark, and I can see the obvious performance improvement from it. (see my previous message). ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From psandoz at openjdk.java.net Tue May 18 15:33:40 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 18 May 2021 15:33:40 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: On Tue, 18 May 2021 05:21:06 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. Now there is much less for me to review :-) Needs a HS review. I shall run it though tier1 to 3 tests (we wait for those results before integrating in case someone else reviews in the interim). ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From neliasso at openjdk.java.net Tue May 18 15:42:38 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 15:42:38 GMT Subject: RFR: 8266480: Implicit null check optimization does not update control of hoisted memory operation In-Reply-To: References: Message-ID: On Tue, 18 May 2021 13:40:05 GMT, Tobias Hartmann wrote: > C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. > > After matching, the graph looks like this: > > > > `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: > > > > Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: > https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 > > However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. > > This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). > > Thanks, > Tobias Does nodes 73, 77 and 78 touch different memory slices? Otherwise 78 should be anti-dependent on 73 and 77. ------------- PR: https://git.openjdk.java.net/jdk/pull/4093 From aph at openjdk.java.net Tue May 18 15:43:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 18 May 2021 15:43:40 GMT Subject: RFR: 8267098: AArch64: C1 StubFrames end confusingly [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 14:36:05 GMT, Alan Hayward wrote: >> For many of the stub frames, a leave/ret is generated after the stub has >> already branched or returned. This is confusing. For these cases, replace >> the superfluous code with a should_not_reach_here >> >> For handle excception, instead of storing return from the exception >> handler on the stack, it can be moved directly into lr, replacing a store and >> load with a single move. (If/when PAC support is implemented, then this store >> would also have to be signed). > > Alan Hayward has updated the pull request incrementally with two additional commits since the last revision: > > - Restore save to stack code > > Change-Id: I20c9d04d372c2da3fceb6921a3696074561666d9 > - Add return_state_t enum > > Change-Id: If4472d3230466ac4409e9b63dedfdb54013c6e3d Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From thartmann at openjdk.java.net Tue May 18 16:05:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 16:05:39 GMT Subject: RFR: 8266480: Implicit null check optimization does not update control of hoisted memory operation In-Reply-To: References: Message-ID: On Tue, 18 May 2021 13:40:05 GMT, Tobias Hartmann wrote: > C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. > > After matching, the graph looks like this: > > > > `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: > > > > Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: > https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 > > However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. > > This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). > > Thanks, > Tobias Yes, exactly. They are accessing the `double` and `int` slices corresponding to `dFld` and `iFld` accesses. ------------- PR: https://git.openjdk.java.net/jdk/pull/4093 From neliasso at openjdk.java.net Tue May 18 17:06:42 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 17:06:42 GMT Subject: RFR: 8266480: Implicit null check optimization does not update control of hoisted memory operation In-Reply-To: References: Message-ID: <-Fs8BPHJ9XyRodyyaLwYVl5YqFcSdcUVGsH1CHHoCTk=.9a9a2e3f-84bf-41ab-bf07-a654686caec6@github.com> On Tue, 18 May 2021 13:40:05 GMT, Tobias Hartmann wrote: > C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. > > After matching, the graph looks like this: > > > > `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: > > > > Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: > https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 > > However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. > > This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4093 From thartmann at openjdk.java.net Tue May 18 17:12:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 18 May 2021 17:12:38 GMT Subject: RFR: 8266480: Implicit null check optimization does not update control of hoisted memory operation In-Reply-To: References: Message-ID: On Tue, 18 May 2021 13:40:05 GMT, Tobias Hartmann wrote: > C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. > > After matching, the graph looks like this: > > > > `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: > > > > Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: > https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 > > However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. > > This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). > > Thanks, > Tobias Thanks, Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/4093 From jbhateja at openjdk.java.net Tue May 18 18:45:14 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 18 May 2021 18:45:14 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v5] In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8256973: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3916/files - new: https://git.openjdk.java.net/jdk/pull/3916/files/95811bc3..0f420eac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=03-04 Stats: 52 lines in 3 files changed: 47 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/3916.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3916/head:pull/3916 PR: https://git.openjdk.java.net/jdk/pull/3916 From dnsimon at openjdk.java.net Tue May 18 19:17:11 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 18 May 2021 19:17:11 GMT Subject: RFR: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 Message-ID: This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. ------------- Commit messages: - revive ResolvedJavaType.getHostClass Changes: https://git.openjdk.java.net/jdk/pull/4099/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4099&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267338 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4099/head:pull/4099 PR: https://git.openjdk.java.net/jdk/pull/4099 From github.com+2249648+johntortugo at openjdk.java.net Tue May 18 19:57:51 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Tue, 18 May 2021 19:57:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v11] In-Reply-To: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> References: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> Message-ID: On Tue, 18 May 2021 00:18:08 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). >> >> For this benchmark, the optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove scratch register from vpmulld src/hotspot/cpu/x86/assembler_x86.cpp line 7859: > 7857: void Assembler::vbroadcastf128(XMMRegister dst, Address src, int vector_len) { > 7858: assert(VM_Version::supports_avx(), ""); > 7859: assert(vector_len == AVX_256bit, ""); Looks like "vector_len" can only be AVX_256bit. Do we really need a parameter then? ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From cwimmer at openjdk.java.net Tue May 18 20:13:41 2021 From: cwimmer at openjdk.java.net (Christian Wimmer) Date: Tue, 18 May 2021 20:13:41 GMT Subject: RFR: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 In-Reply-To: References: Message-ID: On Tue, 18 May 2021 19:01:38 GMT, Doug Simon wrote: > This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/ResolvedJavaType.java line 143: > 141: * This method is preserved for JVMCI backwards compatibility. > 142: */ > 143: default ResolvedJavaType getHostClass() { Mark the method as `deprecated`? ------------- PR: https://git.openjdk.java.net/jdk/pull/4099 From dnsimon at openjdk.java.net Tue May 18 20:35:57 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 18 May 2021 20:35:57 GMT Subject: RFR: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 [v2] In-Reply-To: References: Message-ID: > This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: denote ResolvedJavaType.getHostClass as deprecated ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4099/files - new: https://git.openjdk.java.net/jdk/pull/4099/files/fc6e83a0..5bffb99a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4099&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4099&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4099/head:pull/4099 PR: https://git.openjdk.java.net/jdk/pull/4099 From psandoz at openjdk.java.net Tue May 18 20:36:43 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 18 May 2021 20:36:43 GMT Subject: RFR: 8267190: Optimize Vector API test operations In-Reply-To: References: Message-ID: On Fri, 14 May 2021 23:58:38 GMT, Sandhya Viswanathan wrote: > Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: > 1) reinterpreting the floating point vectors as integral vectors (int/long) > 2) perform the test in integer domain to get a int/long mask > 3) reinterpret the int/long mask as float/double mask > Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. > > For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: > > Base: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms > > With patch: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms > > Best Regards, > Sandhya Changes look good, some minor comments. I shall run it through tier 1 to 3 tests and report back. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte128Vector.java line 609: > 607: @ForceInline > 608: final > 609: VectorMask defaultMaskReinterpret(VectorSpecies dsp) { Since this method is only called by `cast` we can make this method private and accept an argument of `AbstractSpecies`. Further, the length check is duplicated by `cast` so we could turn it into an assert. Extra bonus points for converting the statement switch into an expression switch. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4039 From github.com+58006833+xbzhang99 at openjdk.java.net Tue May 18 21:01:51 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 18 May 2021 21:01:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v11] In-Reply-To: References: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> Message-ID: <3u4Jng7rm3u0yIMbNRVKk4w5MewUeJ_KVNryPBAi6xI=.0b0d52b9-ab33-443c-83d0-54cb5512c286@github.com> On Tue, 18 May 2021 19:45:52 GMT, John Tortugo wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> remove scratch register from vpmulld > > src/hotspot/cpu/x86/assembler_x86.cpp line 7859: > >> 7857: void Assembler::vbroadcastf128(XMMRegister dst, Address src, int vector_len) { >> 7858: assert(VM_Version::supports_avx(), ""); >> 7859: assert(vector_len == AVX_256bit, ""); > > Looks like "vector_len" can only be AVX_256bit. Do we really need a parameter then? your are right, for now it can only be AVX_256bit. But I think in the future other lengths will be used too. So we should have a more generic signature. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From neliasso at openjdk.java.net Tue May 18 21:36:45 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 18 May 2021 21:36:45 GMT Subject: RFR: 8261152: Refine the compiler/vectorapi/VectorRebracket128Test.java test In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 09:58:57 GMT, ?? wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java line 63: >> >>> 61: while (true) { >>> 62: try { >>> 63: System.gc(); >> >> Please, give the following options a try: `-XX:ZCollectionInterval=0.01 -XX:ZFragmentationLimit=0`. >> According to ZGC folks, it should force continuous GC cycles w/ ZGC. > > The original version with option 'CICompilerCount' passed (passed means the bug is not triggered) 5 times in 100 runs, the background gc version passed 8 times in 100 runs. `ZCollectionInterval` passed 44 in 100 runs. > > Explicit trigger gc in the background thread and timer-based gc triggering perform the same thing, it's really strange to behave differently in triggering the bug. The reason I guess is: the load barrier missing bug will only be triggered when the object is relocated and the pointer in another object is not remapped, which means the time window is very short, different options may pose different execution path (which creates different objects, threads, etc.). Why change the options if the original version was better? ------------- PR: https://git.openjdk.java.net/jdk/pull/2422 From kvn at openjdk.java.net Tue May 18 22:33:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 18 May 2021 22:33:42 GMT Subject: RFR: 8266480: Implicit null check optimization does not update control of hoisted memory operation In-Reply-To: References: Message-ID: On Tue, 18 May 2021 13:40:05 GMT, Tobias Hartmann wrote: > C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. > > After matching, the graph looks like this: > > > > `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: > > > > Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: > https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 > > However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. > > This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). > > Thanks, > Tobias Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4093 From mchung at openjdk.java.net Tue May 18 22:33:39 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Tue, 18 May 2021 22:33:39 GMT Subject: RFR: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 20:35:57 GMT, Doug Simon wrote: >> This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > denote ResolvedJavaType.getHostClass as deprecated I missed this in my review. Thanks for adding it back. ------------- Marked as reviewed by mchung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4099 From kvn at openjdk.java.net Tue May 18 22:35:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 18 May 2021 22:35:39 GMT Subject: RFR: 8265262: CITime - 'other' incorrectly calculated In-Reply-To: References: Message-ID: On Mon, 17 May 2021 16:36:26 GMT, Nils Eliasson wrote: > This CR fixes a few issues with the CITIme output for C2: > > 1) The other category for _t_optimize is not removing time spent in _t_vector > > 2) Some of the _t_incrInline sub counters is called from different contexts - calculating 'other' from total time spent in _t_incrInline expects that the counter usage is strictly hierarchical. > > 3) I've placed the non-hierarchical counters in braces. > > 4) Code Installation is a part of Code Emission (_t_output). Indentation fixed. > > 5) Moved "renumber live" after "Vector" so that they appear in order. > > 6) Added sub counters "shorten branches" and "fill buffer" to "Code Emission" phase, and added an other category. Before more than 50% of time in Code Emission was unaccounted for, now it's less than 25%. > > Please review, > Best regards, > Nils Eliasson Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4065 From kvn at openjdk.java.net Tue May 18 22:38:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 18 May 2021 22:38:45 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v11] In-Reply-To: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> References: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> Message-ID: On Tue, 18 May 2021 00:18:08 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). >> >> For this benchmark, the optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove scratch register from vpmulld Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Tue May 18 23:23:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 18 May 2021 23:23:41 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 07:20:18 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix crash problem There is no explanation in JDK-8265129 for what code you implementing intrinsic (except pseudo code in comments in library_call.cpp). The method `getClassId` is declared native in JFR code: https://github.com/openjdk/jdk/blob/master/src/jdk.jfr/share/classes/jdk/jfr/internal/JVM.java#L134 but I had hard time to find native implementation for it or for `getClassIdNonIntrinsic` method (I used `grep -i getClassIdNonIntrinsic -r src/`). I eventually found this code: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceId.cpp#L178 which seems is not what is implemented here. On other hand if you have already native implementation I think we will not benefit much from its intrinsics - you save only only call. Based on complexity of your current changes I would prefer to simple remove current intrinsic code in C1 and C2 which does not work anyway, as you said. And let JFR use native implementation. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3470 From sviswanathan at openjdk.java.net Tue May 18 23:27:13 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 18 May 2021 23:27:13 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v5] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/01a549e4..9021a15c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=03-04 Stats: 1220 lines in 8 files changed: 48 ins; 1104 del; 68 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Tue May 18 23:43:13 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 18 May 2021 23:43:13 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6] In-Reply-To: References: Message-ID: <6cWkz6rWmKC0L2U3sYiELeoOphNGlEHvCoSXcJHE-hE=.04e407ec-be8f-4d01-b527-ec7e1d30bb34@github.com> > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Print intrinsic fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/9021a15c..11528426 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=04-05 Stats: 9 lines in 1 file changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From kvn at openjdk.java.net Tue May 18 23:52:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 18 May 2021 23:52:38 GMT Subject: RFR: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 [v2] In-Reply-To: References: Message-ID: <9DbzGJv8MQN1tmLVZ0bqifZ0xSVElIryfPiRK1j0MSY=.dc3c4e5b-a042-4333-8dd1-88d99b6f60ed@github.com> On Tue, 18 May 2021 20:35:57 GMT, Doug Simon wrote: >> This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > denote ResolvedJavaType.getHostClass as deprecated It seems the fix is incomplete - next test is failing: compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaType.java ------------- PR: https://git.openjdk.java.net/jdk/pull/4099 From sviswanathan at openjdk.java.net Tue May 18 23:59:30 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 18 May 2021 23:59:30 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6] In-Reply-To: <6cWkz6rWmKC0L2U3sYiELeoOphNGlEHvCoSXcJHE-hE=.04e407ec-be8f-4d01-b527-ec7e1d30bb34@github.com> References: <6cWkz6rWmKC0L2U3sYiELeoOphNGlEHvCoSXcJHE-hE=.04e407ec-be8f-4d01-b527-ec7e1d30bb34@github.com> Message-ID: On Tue, 18 May 2021 23:43:13 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Print intrinsic fix @vnlozlov I have implemented all the review comments. Please let me know if the changes look ok to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Tue May 18 23:59:28 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 18 May 2021 23:59:28 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7] In-Reply-To: References: Message-ID: <8FlPfGWCt4m3-JqK3IimdEyhG767zaov3nfteXioR0c=.8e49bf89-d74a-474b-9bd0-cce80f744af3@github.com> > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: jcheck fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/11528426..0d1d0382 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=05-06 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From kvn at openjdk.java.net Wed May 19 00:17:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 00:17:38 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 02:28:09 GMT, Hui Shi wrote: >> Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. >> >> In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. >> In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. >> >> Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. >> >> 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not meet nodes whose _idx is not unique (PhaseIterGVN::optimize is not triggered in the middle of PhaseRenumberLive ). Assertion every node is in current node_arena() in Node::verify, passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. >> >> 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. >> >> 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. >> >> 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. > > Hui Shi has updated the pull request incrementally with one additional commit since the last revision: > > update test requires from "vm.flavor == "server"" to vm.compiler2.enabled Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From kvn at openjdk.java.net Wed May 19 00:17:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 00:17:39 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v2] In-Reply-To: References: Message-ID: <2UyNU9-0AFHD5gc9a_NdhrLTjJj0mTPX95C1VY-eim8=.b4df5ac8-f816-4fbe-ac57-a091313c79e8@github.com> On Tue, 18 May 2021 02:21:36 GMT, Hui Shi wrote: >> src/hotspot/share/opto/node.cpp line 2249: >> >>> 2247: } >>> 2248: } >>> 2249: if (cnt == 2) { >> >> I think it should be `cnt > 1`. > > previous loop breaks when meet first input which is same with x, so cnt must be 2 if x is duplicated with previous input. You are right. I missed `break`. Using an other local VectorSet here may not be cheeper. Okay. Add comment here. Something about that we processed this node already. ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From kvn at openjdk.java.net Wed May 19 00:30:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 00:30:42 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7] In-Reply-To: <8FlPfGWCt4m3-JqK3IimdEyhG767zaov3nfteXioR0c=.8e49bf89-d74a-474b-9bd0-cce80f744af3@github.com> References: <8FlPfGWCt4m3-JqK3IimdEyhG767zaov3nfteXioR0c=.8e49bf89-d74a-474b-9bd0-cce80f744af3@github.com> Message-ID: <-BR26RrPjxCrCj3TmA4xRVCvZxEk8njxsQh7kqjmmts=.df6b4208-d5a1-4d28-802b-1fa28848abce@github.com> On Tue, 18 May 2021 23:59:28 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > jcheck fixes This is much much better! Thank you for changing it. I am only asking now to add comment explaining names. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6975: > 6973: if (libsvml != NULL) { > 6974: log_info(library)("Loaded library %s, handle " INTPTR_FORMAT, JNI_LIB_PREFIX "svml" JNI_LIB_SUFFIX, p2i(libsvml)); > 6975: if (UseAVX > 2) { Please add comment here explaining naming convention you are using here. What `f16_ha_z0` mean? Why `8_ha_z0` and not `d8_ha_z0`? What is `l9`, `e9`, `ex`? ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3638 From kvn at openjdk.java.net Wed May 19 00:34:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 00:34:39 GMT Subject: RFR: 8267190: Optimize Vector API test operations In-Reply-To: References: Message-ID: <8ZVcXmmppPrp_92nHp3r_pCDsP2PcQGWW5wSChKyDHY=.3bbb6d31-b210-4417-a211-c4bc5b083f1e@github.com> On Fri, 14 May 2021 23:58:38 GMT, Sandhya Viswanathan wrote: > Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: > 1) reinterpreting the floating point vectors as integral vectors (int/long) > 2) perform the test in integer domain to get a int/long mask > 3) reinterpret the int/long mask as float/double mask > Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. > > For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: > > Base: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms > > With patch: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms > > Best Regards, > Sandhya Change in vectorIntrinsics.cpp seems fine. I did not look on Java code. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4039 From kvn at openjdk.java.net Wed May 19 00:50:52 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 00:50:52 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v3] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 08:12:46 GMT, Roland Westrelin wrote: >> Sinking data nodes out of a loop when all uses are out of a loop has >> several issues that this attempts to fix. >> >> 1- Only non control uses are considered which makes little sense (why >> not sink if the data node is an argument to a call or a returned >> value?) >> >> 2- Sinking of Loads is broken because of the handling of >> anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control >> in the loop because it takes all uses into account. >> >> 3- For data nodes for which a control edge can't be set, commoning of >> clones back in the loop is prevented with: >> _igvn._worklist.yank(x); >> which gives no guarantee >> >> This patch tries to address all issues: >> >> 1- it looks at all uses, not only non control uses >> >> 2- anti-dependences are computed for each use independently >> >> 3- Cast nodes are used to pin clones out of loop >> >> >> 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl() >> logic. While working on this, I noticed a bug in anti-dependence >> analysis: when the use is a cfg node, the code sometimes looks at uses >> of the memory state of the cfg. The logic uses the use of the cfg >> which is a projection of adr_type identical to the cfg. It should >> instead look at the use of the memory projection. >> >> The existing logic for sinking loads calls clear_dom_lca_tags() for >> every load which seems like quite a waste. I added a >> _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By >> incrementing _dom_lca_tags_round, new tags that don't conflict with >> existing ones are produced and there's no need for >> clear_dom_lca_tags(). >> >> For anti-dependence analysis to return a correct result, early control >> of the load is needed. The only way to get it at this stage, AFAICT, >> is to compute it by following the load's input until a pinned node is >> reached. >> >> The existing logic pins cloned nodes next to their use. The logic I >> propose pins them right out of the loop. This could possibly avoid >> some redundant clones. It also makes some special handling for corner >> cases with loop strip mining useless. >> >> For 3-, I added extra Cast nodes for float types. If a chain of data >> nodes are sunk, the new logic tries to keep a single Cast for the >> entire chain rather than one Cast per node. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Tobias' review > - Merge branch 'master' into JDK-8252372 > - CastVV > - Merge branch 'master' into JDK-8252372 > - extra comments > - fix This looks reasonable to me. Did we got performance results for it? ------------- PR: https://git.openjdk.java.net/jdk/pull/3689 From sviswanathan at openjdk.java.net Wed May 19 00:58:15 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 00:58:15 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Add comments explaining naming convention ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/0d1d0382..45f20a34 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=06-07 Stats: 15 lines in 1 file changed: 15 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Wed May 19 00:58:18 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 00:58:18 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7] In-Reply-To: <-BR26RrPjxCrCj3TmA4xRVCvZxEk8njxsQh7kqjmmts=.df6b4208-d5a1-4d28-802b-1fa28848abce@github.com> References: <8FlPfGWCt4m3-JqK3IimdEyhG767zaov3nfteXioR0c=.8e49bf89-d74a-474b-9bd0-cce80f744af3@github.com> <-BR26RrPjxCrCj3TmA4xRVCvZxEk8njxsQh7kqjmmts=.df6b4208-d5a1-4d28-802b-1fa28848abce@github.com> Message-ID: On Wed, 19 May 2021 00:26:48 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> jcheck fixes > > This is much much better! Thank you for changing it. I am only asking now to add comment explaining names. @vnkozlov I have added comments explaining naming convention. Please let me know if this looks ok. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From kvn at openjdk.java.net Wed May 19 01:07:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 01:07:42 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add comments explaining naming convention Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3638 From hshi at openjdk.java.net Wed May 19 01:28:40 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Wed, 19 May 2021 01:28:40 GMT Subject: Integrated: 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use" In-Reply-To: References: Message-ID: <5rfjcaosrV_77MRg_OELtkK14yG7t1YXTnpeDFOCb_U=.6e67bfc9-814c-4b77-8dd7-13ddd2cc5e6c@github.com> On Mon, 17 May 2021 11:44:48 GMT, Hui Shi wrote: > ? crash with "no reachable node should have no use" > > Please help reivew this fix. > > StrIntrinsicNode::Ideal uses Node::set_req to replace memory input, old memory input might have 0 use, but not added into PhaseGVN worklist. Using set_req_X to ensure add 0 out old memory input node into PhaseGVN worklist. > > Find other two similar problemtic code in LoadNode::Ideal. > > Tier1/2/3 pass with release/fastdebug build. > test/jdk/java/util/Collections/FindSubList.java doesn't fail in 100 runs (before fix 2/3 failure in 10 runs). This pull request has now been integrated. Changeset: 324defe2 Author: Hui Shi Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/324defe2b6c83de76a37d1b4b360869a77bed036 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod 8267212: test/jdk/java/util/Collections/FindSubList.java intermittent crash with "no reachable node should have no use" Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4055 From sviswanathan at openjdk.java.net Wed May 19 01:31:40 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 01:31:40 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add comments explaining naming convention Thanks a lot Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From hshi at openjdk.java.net Wed May 19 01:42:07 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Wed, 19 May 2021 01:42:07 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v3] In-Reply-To: References: Message-ID: > Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. > > Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. > > 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not meet nodes whose _idx is not unique (PhaseIterGVN::optimize is not triggered in the middle of PhaseRenumberLive ). Assertion every node is in current node_arena() in Node::verify, passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. > > 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. > > 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. > > 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. Hui Shi has updated the pull request incrementally with one additional commit since the last revision: Add comments for duplicated input processing in Node::Verify ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4045/files - new: https://git.openjdk.java.net/jdk/pull/4045/files/9635c9b9..1ca049ff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4045&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4045&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4045/head:pull/4045 PR: https://git.openjdk.java.net/jdk/pull/4045 From hshi at openjdk.java.net Wed May 19 01:42:07 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Wed, 19 May 2021 01:42:07 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v3] In-Reply-To: <2UyNU9-0AFHD5gc9a_NdhrLTjJj0mTPX95C1VY-eim8=.b4df5ac8-f816-4fbe-ac57-a091313c79e8@github.com> References: <2UyNU9-0AFHD5gc9a_NdhrLTjJj0mTPX95C1VY-eim8=.b4df5ac8-f816-4fbe-ac57-a091313c79e8@github.com> Message-ID: On Wed, 19 May 2021 00:14:46 GMT, Vladimir Kozlov wrote: >> previous loop breaks when meet first input which is same with x, so cnt must be 2 if x is duplicated with previous input. > > You are right. I missed `break`. Using an other local VectorSet here may not be cheeper. Okay. > Add comment here. Something about that we processed this node already. Thanks for your review! comment is added. ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From github.com+4146708+a74nh at openjdk.java.net Wed May 19 01:51:43 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Wed, 19 May 2021 01:51:43 GMT Subject: Integrated: 8267098: AArch64: C1 StubFrames end confusingly In-Reply-To: References: Message-ID: <6DWQoQ1qGf4UYvdxvO_W_VuPU0x3C9xoRilFNJYgij8=.ed81b163-a4a9-4960-96e8-100aa3f9e83c@github.com> On Fri, 14 May 2021 11:28:45 GMT, Alan Hayward wrote: > For many of the stub frames, a leave/ret is generated after the stub has > already branched or returned. This is confusing. For these cases, replace > the superfluous code with a should_not_reach_here > > For handle excception, instead of storing return from the exception > handler on the stack, it can be moved directly into lr, replacing a store and > load with a single move. (If/when PAC support is implemented, then this store > would also have to be signed). This pull request has now been integrated. Changeset: ff84577d Author: Alan Hayward Committer: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/ff84577d72226da0bf1ce2c6d6852f3934feecf2 Stats: 38 lines in 1 file changed: 11 ins; 9 del; 18 mod 8267098: AArch64: C1 StubFrames end confusingly Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/4030 From yyang at openjdk.java.net Wed May 19 02:39:05 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 02:39:05 GMT Subject: RFR: 8267151: C2: Don't create dummy Opaque1Node for outmost unswitched IfNode [v2] In-Reply-To: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> References: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> Message-ID: <9lr_DC8R0J8wu-Z3Xb74vWCbxcjtUUkzarCeHDbnpz0=.eb9054c5-a14d-4844-98a6-af45ba8c5914@github.com> > In create_slow_version_of_loop(), C2 creates the outmost unswitched IfNode(i.e. **if(xx)**{ for{} }else{ for{} }) with a dummy opaque bool node as its condition input. > > https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L265-L271 > > After that, it sets the _prob(missing _fcnt?) of the outmost unswitched IfNode in do_unswitching(). > > https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L186-L191 > > I think we can merge these two steps into a single step, that is, create the outmost unswitched IfNode meanwhile setting its condition input, _prob and _fcnt w/ creating the dummy opaque bool node. > > Testing: > - hotspot/jtreg/compiler(slowdebug) Yi Yang has updated the pull request incrementally with one additional commit since the last revision: unused head->is_CountedLoop() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4079/files - new: https://git.openjdk.java.net/jdk/pull/4079/files/eb2e1370..55712395 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4079&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4079&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4079.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4079/head:pull/4079 PR: https://git.openjdk.java.net/jdk/pull/4079 From yyang at openjdk.java.net Wed May 19 02:42:03 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 02:42:03 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant [v3] In-Reply-To: References: Message-ID: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> > % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). > > So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: > - x % -y ==> [0, y - 1] RCE > - x % y ==> [0, y - 1] RCE > - -x % y ==> [-y + 1, 0] > - -x % -y ==> [-y + 1, 0] > > Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: > > > for(int i=0;i<1000;i++){ > int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant > .... > } > > > For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. > > Testing: > - test/hotspot/jtreg/compiler/c1/(slowdebug) Yi Yang has updated the pull request incrementally with one additional commit since the last revision: missing whitespace; more comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4083/files - new: https://git.openjdk.java.net/jdk/pull/4083/files/64bdf0f2..35aaa375 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4083&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4083&range=01-02 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4083.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4083/head:pull/4083 PR: https://git.openjdk.java.net/jdk/pull/4083 From yyang at openjdk.java.net Wed May 19 02:42:04 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 02:42:04 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 14:44:46 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> more comment for test > > src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 246: > >> 244: _bound = new Bound(); >> 245: } >> 246: }else { > > Missing whitespace `}else` > Looks like C2 does not implement this optimization (should be in ModINode::Value). We should add it. Thanks for confirming the C2 part, I will double-check it and file an issue for that~ > Missing whitespace }else Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/4083 From yyang at openjdk.java.net Wed May 19 02:44:38 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 02:44:38 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v2] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 09:45:19 GMT, Yi Yang wrote: >> After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. >> >> There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() >> >> https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 >> >> We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. >> >> (This patch actually does two things: >> 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw >> 2. `Cleanup unused Unsafe{Get,Put}Raw code` >> They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) >> >> Thanks! >> Yang > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > unaliged_move for ppc/s390 PING: May?I?ask?your?help?to?review?this?patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/3917 From ddong at openjdk.java.net Wed May 19 03:10:42 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 19 May 2021 03:10:42 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 07:20:18 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix crash problem Hi Vladimir, Thanks for your comment. Yes, the native implementation for `getClassIdNonIntrinsic`/`getClassId` is located in `jfrTraceId.cpp#L178` just as you said, more specifically, there are two path, one(JfrTraceId::load) for normal class and one(load_primitive) for primitive class (includeing void.class). My pseudo-code(the comment of `LibraryCallKit::inline_native_classID`) is consistent with the implementation of these two paths. And in the normal class implementation path, there are fast path and slow path(see JfrTraceIdLoadBarrier::load), only some comparison and shift operations are needed to obtain the class ID in the fast path, and that's where I think intrinsic can bring performance improvements, I saw about 20x improvement from my microbenchmark. Judging from the current JFR implementation, there are already some events that need to rely on this API, such as `ExceptionThrownEvent` and `ErrorThrownEvent` use `thrownClass` to record the type of exception, and I also noticed that there is a new PR(https://github.com/openjdk/jdk/pull/4101) to add `FinalizerEvent` which include a field named `finalizedClass` to record the type information. Therefore, I have reason to believe that this API will be frequently used during the JFR activation process. As far as the current implementation is concerned, it is indeed a bit complicated, I think some simplifications can be made, for example, only the fast path for the normal class is retained, and other paths are directly implemented by calling the native function. What do you think? @egahlin @mgronlun And I hope JFR's folks could give some suggestions on this PR:) Best, Denghui ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From sviswanathan at openjdk.java.net Wed May 19 03:37:11 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 03:37:11 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: fix 32-bit build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/45f20a34..f7e39913 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From xgong at openjdk.java.net Wed May 19 03:39:38 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Wed, 19 May 2021 03:39:38 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: <_c7Ik2rZymkq9p0DqcHNeEWqbjs1ToH6WVmq_jR0f7U=.63a3cd81-b760-44d1-85b7-d654b0fd6240@github.com> References: <_c7Ik2rZymkq9p0DqcHNeEWqbjs1ToH6WVmq_jR0f7U=.63a3cd81-b760-44d1-85b7-d654b0fd6240@github.com> Message-ID: On Tue, 18 May 2021 11:23:32 GMT, Vladimir Ivanov wrote: > > As far as I know the VectorLoadConst is used here to get the initial shuffle iota of the vector. I'm not so clear about what the iota vector constant materialization you mean. > > `VectorLoadConst` is backed by a constant in `StubRoutines`. Instead, the constant can be materialized as a on-heap ByteVector instance, cached in a static final field, and passed into the intrinsic. > > An alternative approach would be to replace `VectorLoadConst` with a `LoadVector` which performs raw vector access at `StubRoutines::_vector_iota_indices` address. > > All in all, I don't see `VectorLoadConst` well-justified. > > ``` > src/hotspot/cpu/aarch64/aarch64_neon.ad: > > 3369 instruct loadcon8B(vecD dst, immI0 src) > 3374 match(Set dst (VectorLoadConst src)); > 3378 __ lea(rscratch1, ExternalAddress(StubRoutines::aarch64::vector_iota_indices())); > > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp: > > 7055 StubRoutines::aarch64::_vector_iota_indices = generate_iota_indices("iota_indices"); > > 618 // Generate indices for iota vector. > 619 address generate_iota_indices(const char *stub_name) { > 620 __ align(CodeEntryAlignment); > 621 StubCodeMark mark(this, "StubRoutines", stub_name); > 622 address start = __ pc(); > 623 __ emit_data64(0x0706050403020100, relocInfo::none); > 624 __ emit_data64(0x0F0E0D0C0B0A0908, relocInfo::none); > 625 return start; > 626 } > ``` Thanks for your explanation! Yes, currently `VectorLoadConst` is implemented by loading the constant in StubRoutines. And the alternative way that passing a static final instance (like `ByteMaxShuffle.IOTA`) to the intrinisc makes sense to me. Maybe we can revisit it as a kind of optimization in future? Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/4023 From jbhateja at openjdk.java.net Wed May 19 05:15:16 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 19 May 2021 05:15:16 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v6] In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8256973: Final synthetic comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3916/files - new: https://git.openjdk.java.net/jdk/pull/3916/files/0f420eac..9f31b55e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=04-05 Stats: 24 lines in 1 file changed: 4 ins; 12 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/3916.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3916/head:pull/3916 PR: https://git.openjdk.java.net/jdk/pull/3916 From jbhateja at openjdk.java.net Wed May 19 05:22:39 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 19 May 2021 05:22:39 GMT Subject: Integrated: 8265126: [REDO] unified handling for VectorMask object re-materialization during de-optimization In-Reply-To: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> References: <2KlypmIXfPwTn6dp0AG9W0LMqvu6griQnWjSEQBwn2o=.6b09a942-e241-4ff6-b581-1810aa6bb124@github.com> Message-ID: <8-NS3-09tBEXVPw2JVTQYgG_0EBCmc-RGiae5nGreQM=.d2e8cea5-1c6c-4b7f-8a6e-715b6cb61e0c@github.com> On Tue, 27 Apr 2021 18:01:17 GMT, Jatin Bhateja wrote: > Following flow describes object reconstruction for de-optimization:- > > 1. PhaseVector::scalarize_vbox_node() creates SafePointScalarObjectNode to captures the box type information, also it connects to node holding the boxed value. > 2. During code emit phase (PhaseOutput) C2 process above information to dumps ObjectValue holding the box information and LocationValue to holding the value information into ScopeDescriptor corresponding to Safepoint PC. > 3. De-optimization blobs dump the value held in registers to the stack locations using RegisterSave::save_live_registers() and a mapping b/w register and its stack location is added to RegisterMap. > 4. During de-optimization, compiled frame objects are re-allocated using identity information held in ObjectValue and their fields are initialized using values held in the stack locations accessed through register-stack mappings. > > By inserting a VectorStoreMaskNode before stitching the mask holding node to Safepoint we make sure that value held in opmask/vector register is transferred to a byte vector. Thus rest of the flow works as it is, stack location will hold the value in the form of a byte array irrespective of the box shape. > > tier1-tier3 regressions are clean with UseAVX=2/3. This pull request has now been integrated. Changeset: 65a8bf58 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/65a8bf58bef1a3c50d434b0b351247b5a3a426cb Stats: 74 lines in 3 files changed: 26 ins; 27 del; 21 mod 8265126: [REDO] unified handling for VectorMask object re-materialization during de-optimization Reviewed-by: vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/3721 From jbhateja at openjdk.java.net Wed May 19 05:26:47 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 19 May 2021 05:26:47 GMT Subject: Integrated: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs In-Reply-To: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Fri, 7 May 2021 14:23:38 GMT, Jatin Bhateja wrote: > This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. > 1) VectorMask.firstTrue. > 2) VectorMask.lastTrue. > 3) VectorMask.trueCount. > > Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. > X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. > > Intrinsification is not performed for vector species containing less than two vector lanes. > > Please find below the performance number for benchmark included in the patch: > Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) > > > VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN > -- | -- | -- | -- | -- | -- > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 > MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 > MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 > MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 > MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 > MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 > MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 > MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 > MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 > MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 > MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 > MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 > MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 > MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 > MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 > MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 > MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 > MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 > MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 > MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 > MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 > MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 > MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 > MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 > MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 > MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 > MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 > MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 > MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 > MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 > MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 > MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 > MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 > MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 > MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 > MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 > MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 > > ALGO (1=bestcase, 2=worstcast,3=avgcase) This pull request has now been integrated. Changeset: 7aa65685 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/7aa65685b8ce047f075c45cc16bec5c79b8eef27 Stats: 1847 lines in 81 files changed: 1814 ins; 30 del; 3 mod 8256973: Intrinsic creation for VectorMask query (lastTrue,firstTrue,trueCount) APIs Reviewed-by: psandoz, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From github.com+25214855+casparcwang at openjdk.java.net Wed May 19 06:47:42 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Wed, 19 May 2021 06:47:42 GMT Subject: RFR: 8261152: Refine the compiler/vectorapi/VectorRebracket128Test.java test In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 04:42:30 GMT, ?? wrote: > Refine test VectorRebracket128Test.java as discussed here https://github.com/openjdk/jdk16/pull/139#discussion_r567796847 > > 1, Explicit trigger gc in the test > 2, Remove redundant imports > 3, Remove weird java options This patch is trivial and does not show any benefits, so just close it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2422 From github.com+25214855+casparcwang at openjdk.java.net Wed May 19 06:47:42 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Wed, 19 May 2021 06:47:42 GMT Subject: Withdrawn: 8261152: Refine the compiler/vectorapi/VectorRebracket128Test.java test In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 04:42:30 GMT, ?? wrote: > Refine test VectorRebracket128Test.java as discussed here https://github.com/openjdk/jdk16/pull/139#discussion_r567796847 > > 1, Explicit trigger gc in the test > 2, Remove redundant imports > 3, Remove weird java options This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2422 From vlivanov at openjdk.java.net Wed May 19 07:26:45 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 19 May 2021 07:26:45 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v6] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Wed, 19 May 2021 05:15:16 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Final synthetic comments resolution. Jatin, the final commit erroneously contains `mask.incr` file. Please, remove it. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From thartmann at openjdk.java.net Wed May 19 07:27:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 07:27:39 GMT Subject: RFR: 8266480: Implicit null check optimization does not update control of hoisted memory operation In-Reply-To: References: Message-ID: On Tue, 18 May 2021 13:40:05 GMT, Tobias Hartmann wrote: > C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. > > After matching, the graph looks like this: > > > > `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: > > > > Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: > https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 > > However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. > > This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). > > Thanks, > Tobias Thanks, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/4093 From thartmann at openjdk.java.net Wed May 19 07:30:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 07:30:41 GMT Subject: Integrated: 8266480: Implicit null check optimization does not update control of hoisted memory operation In-Reply-To: References: Message-ID: On Tue, 18 May 2021 13:40:05 GMT, Tobias Hartmann wrote: > C2 replaces explicit null checks by hoisting a nearby memory operation to the null check and using it as implicit null check. In some cases, control of that memory operation is not updated correctly, leading to assert failures during `PhaseCFG::verify()` because a use is no longer dominated by its definition. > > After matching, the graph looks like this: > > > > `64 testP_reg` is an explicit null check and `78 loadD`, `73 storeD` and `77 storeImmI` are candidates for an implicit null check because they are operating on the same oop. `PhaseCFG::implicit_null_check` decides to hoist the `77 storeImmI` from the `not_null_block` B12 to the null check in B11/B13: > > > > Now the problem is that control of `77 storeImmI` was not updated and still points into the non-dominating block B15. The following code is supposed to fix this: > https://github.com/openjdk/jdk/blob/9d168e25d1e2e8b662dc7aa6cda7516c423cef7d/src/hotspot/share/opto/lcm.cpp#L413-L418 > > However, it does not trigger because control is not the `not_null_block->head()` but `59 MachProj` which is the control projection from `60 CallLeafDirect` emitted by a `drem`. The fix is to simply check `get_block_for_node(ctrl)` instead. > > This is an old issue that was only caught by the assert recently introduced by [JDK-8263227](https://bugs.openjdk.java.net/browse/JDK-8263227). > > Thanks, > Tobias This pull request has now been integrated. Changeset: c2b50f93 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/c2b50f93ac36cdfd96d3ed09ec80ee5255a10200 Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod 8266480: Implicit null check optimization does not update control of hoisted memory operation Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/4093 From vlivanov at openjdk.java.net Wed May 19 07:31:38 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 19 May 2021 07:31:38 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: References: <_c7Ik2rZymkq9p0DqcHNeEWqbjs1ToH6WVmq_jR0f7U=.63a3cd81-b760-44d1-85b7-d654b0fd6240@github.com> Message-ID: On Wed, 19 May 2021 03:36:40 GMT, Xiaohong Gong wrote: > Maybe we can revisit it as a kind of optimization in future? Sure. Please, file an RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/4023 From thartmann at openjdk.java.net Wed May 19 07:37:47 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 07:37:47 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v6] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Wed, 19 May 2021 05:15:16 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Final synthetic comments resolution. https://bugs.openjdk.java.net/browse/JDK-8267357 ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From jiefu at openjdk.java.net Wed May 19 07:37:47 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 07:37:47 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v6] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Wed, 19 May 2021 07:23:28 GMT, Vladimir Ivanov wrote: > Jatin, the final commit erroneously contains `mask.incr` file. Please, remove it. PR: https://github.com/openjdk/jdk/pull/4107 ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From xgong at openjdk.java.net Wed May 19 07:39:50 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Wed, 19 May 2021 07:39:50 GMT Subject: RFR: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: References: <_c7Ik2rZymkq9p0DqcHNeEWqbjs1ToH6WVmq_jR0f7U=.63a3cd81-b760-44d1-85b7-d654b0fd6240@github.com> Message-ID: On Wed, 19 May 2021 07:28:26 GMT, Vladimir Ivanov wrote: > > Maybe we can revisit it as a kind of optimization in future? > > Sure. Please, file an RFE. Sure, please see: https://bugs.openjdk.java.net/browse/JDK-8267366. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/4023 From thartmann at openjdk.java.net Wed May 19 07:40:03 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 07:40:03 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v3] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 08:12:46 GMT, Roland Westrelin wrote: >> Sinking data nodes out of a loop when all uses are out of a loop has >> several issues that this attempts to fix. >> >> 1- Only non control uses are considered which makes little sense (why >> not sink if the data node is an argument to a call or a returned >> value?) >> >> 2- Sinking of Loads is broken because of the handling of >> anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control >> in the loop because it takes all uses into account. >> >> 3- For data nodes for which a control edge can't be set, commoning of >> clones back in the loop is prevented with: >> _igvn._worklist.yank(x); >> which gives no guarantee >> >> This patch tries to address all issues: >> >> 1- it looks at all uses, not only non control uses >> >> 2- anti-dependences are computed for each use independently >> >> 3- Cast nodes are used to pin clones out of loop >> >> >> 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl() >> logic. While working on this, I noticed a bug in anti-dependence >> analysis: when the use is a cfg node, the code sometimes looks at uses >> of the memory state of the cfg. The logic uses the use of the cfg >> which is a projection of adr_type identical to the cfg. It should >> instead look at the use of the memory projection. >> >> The existing logic for sinking loads calls clear_dom_lca_tags() for >> every load which seems like quite a waste. I added a >> _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By >> incrementing _dom_lca_tags_round, new tags that don't conflict with >> existing ones are produced and there's no need for >> clear_dom_lca_tags(). >> >> For anti-dependence analysis to return a correct result, early control >> of the load is needed. The only way to get it at this stage, AFAICT, >> is to compute it by following the load's input until a pinned node is >> reached. >> >> The existing logic pins cloned nodes next to their use. The logic I >> propose pins them right out of the loop. This could possibly avoid >> some redundant clones. It also makes some special handling for corner >> cases with loop strip mining useless. >> >> For 3-, I added extra Cast nodes for float types. If a chain of data >> nodes are sunk, the new logic tries to keep a single Cast for the >> entire chain rather than one Cast per node. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Tobias' review > - Merge branch 'master' into JDK-8252372 > - CastVV > - Merge branch 'master' into JDK-8252372 > - extra comments > - fix Yes, I've linked it to the bug. ------------- PR: https://git.openjdk.java.net/jdk/pull/3689 From jiefu at openjdk.java.net Wed May 19 07:40:12 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 07:40:12 GMT Subject: RFR: 8267364: Remove mask.incr which is introduced by JDK-8256973 Message-ID: Hi all, Please review the trivial change which removes the useless mask.incr. Thanks. Best regards, Jie ------------- Commit messages: - 8267364: Remove mask.incr which is introduced by JDK-8256973 Changes: https://git.openjdk.java.net/jdk/pull/4107/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4107&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267364 Stats: 43 lines in 1 file changed: 0 ins; 43 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4107.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4107/head:pull/4107 PR: https://git.openjdk.java.net/jdk/pull/4107 From vlivanov at openjdk.java.net Wed May 19 07:44:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 19 May 2021 07:44:43 GMT Subject: RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs [v6] In-Reply-To: References: <73lFD51hzmiF_KrQyPyE5c7lbf-Bp6V5vptzGo7JioY=.f34509d0-04c1-4c6d-878f-baa433b315a7@github.com> Message-ID: On Wed, 19 May 2021 05:15:16 GMT, Jatin Bhateja wrote: >> This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target. >> 1) VectorMask.firstTrue. >> 2) VectorMask.lastTrue. >> 3) VectorMask.trueCount. >> >> Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits. >> X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying. >> >> Intrinsification is not performed for vector species containing less than two vector lanes. >> >> Please find below the performance number for benchmark included in the patch: >> Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C) >> >> >> VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN >> -- | -- | -- | -- | -- | -- >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816 >> MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455 >> MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619 >> MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088 >> MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052 >> MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476 >> MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106 >> MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579 >> MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348 >> MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421 >> MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199 >> MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815 >> MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008 >> MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672 >> MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263 >> MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644 >> MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529 >> MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122 >> MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518 >> MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815 >> MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805 >> MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831 >> MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655 >> MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289 >> MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943 >> MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499 >> MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171 >> MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103 >> MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166 >> >> ALGO (1=bestcase, 2=worstcast,3=avgcase) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8256973: Final synthetic comments resolution. > PR: #4107 Thanks, Jie. Reviewed. ------------- PR: https://git.openjdk.java.net/jdk/pull/3916 From vlivanov at openjdk.java.net Wed May 19 07:47:44 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 19 May 2021 07:47:44 GMT Subject: RFR: 8267364: Remove mask.incr which is introduced by JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 07:32:45 GMT, Jie Fu wrote: > Hi all, > > Please review the trivial change which removes the useless mask.incr. > > Thanks. > Best regards, > Jie Looks good and trivial. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4107 From thartmann at openjdk.java.net Wed May 19 07:47:44 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 07:47:44 GMT Subject: RFR: 8267364: Remove mask.incr which is introduced by JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 07:32:45 GMT, Jie Fu wrote: > Hi all, > > Please review the trivial change which removes the useless mask.incr. > > Thanks. > Best regards, > Jie Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4107 From jiefu at openjdk.java.net Wed May 19 07:47:44 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 07:47:44 GMT Subject: RFR: 8267364: Remove mask.incr which is introduced by JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 07:42:08 GMT, Tobias Hartmann wrote: > Looks good and trivial. Thanks @iwanowww . ------------- PR: https://git.openjdk.java.net/jdk/pull/4107 From jiefu at openjdk.java.net Wed May 19 07:47:44 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 07:47:44 GMT Subject: RFR: 8267364: Remove mask.incr which is introduced by JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 07:42:22 GMT, Jie Fu wrote: > Looks good and trivial. Thanks @TobiHartmann . ------------- PR: https://git.openjdk.java.net/jdk/pull/4107 From jiefu at openjdk.java.net Wed May 19 07:47:45 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 07:47:45 GMT Subject: Integrated: 8267364: Remove mask.incr which is introduced by JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 07:32:45 GMT, Jie Fu wrote: > Hi all, > > Please review the trivial change which removes the useless mask.incr. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 49543831 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/4954383168422a6ba2be8cda5535f90829d97ef8 Stats: 43 lines in 1 file changed: 0 ins; 43 del; 0 mod 8267364: Remove mask.incr which is introduced by JDK-8256973 Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4107 From xgong at openjdk.java.net Wed May 19 07:52:38 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Wed, 19 May 2021 07:52:38 GMT Subject: Integrated: 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node In-Reply-To: References: Message-ID: On Fri, 14 May 2021 06:04:45 GMT, Xiaohong Gong wrote: > When creating the vector shuffle, the `"VectorLoadConstNode"` will be created to get an initial index vector. Before creating it, the compiler should check whether the current platform supports this opcode in case the jvm crashes with `"bad ad file"`. The compiler should finish the intrinsification and go back to the default java implementation if the backend doesn't support it. > > Tested tier1 and jdk::tier3. This pull request has now been integrated. Changeset: 2563a6a9 Author: Xiaohong Gong Committer: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/2563a6a9b5e81b4624704da4e8a2f24a6c5e8a5b Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8266962: Add arch supporting check for "Op_VectorLoadConst" before creating the node Reviewed-by: vlivanov, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/4023 From dnsimon at openjdk.java.net Wed May 19 08:02:08 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 19 May 2021 08:02:08 GMT Subject: RFR: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 [v3] In-Reply-To: References: Message-ID: > This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed failing test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4099/files - new: https://git.openjdk.java.net/jdk/pull/4099/files/5bffb99a..9f8a21ef Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4099&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4099&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4099/head:pull/4099 PR: https://git.openjdk.java.net/jdk/pull/4099 From vlivanov at openjdk.java.net Wed May 19 08:10:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 19 May 2021 08:10:43 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 03:37:11 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > fix 32-bit build Overall, looks fine. src/hotspot/cpu/x86/x86_64.ad line 1704: > 1702: } > 1703: > 1704: void Matcher::vector_calling_convention(VMRegPair *regs, uint num_bits, uint total_args_passed) { You can just remove `Matcher::vector_calling_convention()` and call `SharedRuntime::vector_calling_convention()` directly. src/hotspot/share/opto/matcher.cpp line 1370: > 1368: VMReg first = parm_regs[i].first(); > 1369: VMReg second = parm_regs[i].second(); > 1370: if( !first->is_valid() && Please, fix formatting. src/hotspot/share/opto/vectorIntrinsics.cpp line 1355: > 1353: > 1354: // Get address for svml method. > 1355: get_svml_address(vector_api_op_id, vt->length_in_bytes() * BitsPerByte, bt, name, 100, &addr); Any particular reason to return the address as an out argument and not directly (`address addr = get_svml_address(...)`)? src/hotspot/share/utilities/globalDefinitions_vecApi.hpp line 34: > 32: // VS2017 required to build .s files for math intrinsics > 33: #if defined(_WIN64) && (defined(_MSC_VER) && (_MSC_VER >= 1910)) > 34: #define __VECTOR_API_MATH_INTRINSICS_COMMON Considering the stubs are not part of JVM anymore, the macros can go away. The stubs are dynamically linked now and if there's no library built/present the linking will fail. And then `globalDefinitions_vecApi.hpp` becomes empty and can be removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From thartmann at openjdk.java.net Wed May 19 08:12:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 08:12:42 GMT Subject: RFR: 8267151: C2: Don't create dummy Opaque1Node for outmost unswitched IfNode [v2] In-Reply-To: <9lr_DC8R0J8wu-Z3Xb74vWCbxcjtUUkzarCeHDbnpz0=.eb9054c5-a14d-4844-98a6-af45ba8c5914@github.com> References: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> <9lr_DC8R0J8wu-Z3Xb74vWCbxcjtUUkzarCeHDbnpz0=.eb9054c5-a14d-4844-98a6-af45ba8c5914@github.com> Message-ID: On Wed, 19 May 2021 02:39:05 GMT, Yi Yang wrote: >> In create_slow_version_of_loop(), C2 creates the outmost unswitched IfNode(i.e. **if(xx)**{ for{} }else{ for{} }) with a dummy opaque bool node as its condition input. >> >> https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L265-L271 >> >> After that, it sets the _prob(missing _fcnt?) of the outmost unswitched IfNode in do_unswitching(). >> >> https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L186-L191 >> >> I think we can merge these two steps into a single step, that is, create the outmost unswitched IfNode meanwhile setting its condition input, _prob and _fcnt w/ creating the dummy opaque bool node. >> >> Testing: >> - hotspot/jtreg/compiler(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > unused head->is_CountedLoop() Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4079 From thartmann at openjdk.java.net Wed May 19 08:17:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 08:17:42 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant [v3] In-Reply-To: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> References: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> Message-ID: On Wed, 19 May 2021 02:42:03 GMT, Yi Yang wrote: >> % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). >> >> So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: >> - x % -y ==> [0, y - 1] RCE >> - x % y ==> [0, y - 1] RCE >> - -x % y ==> [-y + 1, 0] >> - -x % -y ==> [-y + 1, 0] >> >> Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: >> >> >> for(int i=0;i<1000;i++){ >> int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant >> .... >> } >> >> >> For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. >> >> Testing: >> - test/hotspot/jtreg/compiler/c1/(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > missing whitespace; more comment Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4083 From neliasso at openjdk.java.net Wed May 19 08:28:43 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 08:28:43 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant [v3] In-Reply-To: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> References: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> Message-ID: On Wed, 19 May 2021 02:42:03 GMT, Yi Yang wrote: >> % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). >> >> So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: >> - x % -y ==> [0, y - 1] RCE >> - x % y ==> [0, y - 1] RCE >> - -x % y ==> [-y + 1, 0] >> - -x % -y ==> [-y + 1, 0] >> >> Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: >> >> >> for(int i=0;i<1000;i++){ >> int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant >> .... >> } >> >> >> For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. >> >> Testing: >> - test/hotspot/jtreg/compiler/c1/(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > missing whitespace; more comment Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4083 From neliasso at openjdk.java.net Wed May 19 08:29:39 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 08:29:39 GMT Subject: RFR: 8267151: C2: Don't create dummy Opaque1Node for outmost unswitched IfNode [v2] In-Reply-To: <9lr_DC8R0J8wu-Z3Xb74vWCbxcjtUUkzarCeHDbnpz0=.eb9054c5-a14d-4844-98a6-af45ba8c5914@github.com> References: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> <9lr_DC8R0J8wu-Z3Xb74vWCbxcjtUUkzarCeHDbnpz0=.eb9054c5-a14d-4844-98a6-af45ba8c5914@github.com> Message-ID: On Wed, 19 May 2021 02:39:05 GMT, Yi Yang wrote: >> In create_slow_version_of_loop(), C2 creates the outmost unswitched IfNode(i.e. **if(xx)**{ for{} }else{ for{} }) with a dummy opaque bool node as its condition input. >> >> https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L265-L271 >> >> After that, it sets the _prob(missing _fcnt?) of the outmost unswitched IfNode in do_unswitching(). >> >> https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L186-L191 >> >> I think we can merge these two steps into a single step, that is, create the outmost unswitched IfNode meanwhile setting its condition input, _prob and _fcnt w/ creating the dummy opaque bool node. >> >> Testing: >> - hotspot/jtreg/compiler(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > unused head->is_CountedLoop() Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4079 From neliasso at openjdk.java.net Wed May 19 08:33:41 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 08:33:41 GMT Subject: RFR: 8265262: CITime - 'other' incorrectly calculated In-Reply-To: References: Message-ID: On Mon, 17 May 2021 16:36:26 GMT, Nils Eliasson wrote: > This CR fixes a few issues with the CITIme output for C2: > > 1) The other category for _t_optimize is not removing time spent in _t_vector > > 2) Some of the _t_incrInline sub counters is called from different contexts - calculating 'other' from total time spent in _t_incrInline expects that the counter usage is strictly hierarchical. > > 3) I've placed the non-hierarchical counters in braces. > > 4) Code Installation is a part of Code Emission (_t_output). Indentation fixed. > > 5) Moved "renumber live" after "Vector" so that they appear in order. > > 6) Added sub counters "shorten branches" and "fill buffer" to "Code Emission" phase, and added an other category. Before more than 50% of time in Code Emission was unaccounted for, now it's less than 25%. > > Please review, > Best regards, > Nils Eliasson Thanks for the reviews, Tobias and Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/4065 From jbhateja at openjdk.java.net Wed May 19 08:33:55 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 19 May 2021 08:33:55 GMT Subject: RFR: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 Message-ID: Relevant declarations modified and tested with -Werror, no longer see unchecked conversion warnings. Kindly review and approve. ------------- Commit messages: - 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 Changes: https://git.openjdk.java.net/jdk/pull/4108/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4108&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267357 Stats: 10 lines in 1 file changed: 0 ins; 1 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/4108.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4108/head:pull/4108 PR: https://git.openjdk.java.net/jdk/pull/4108 From jiefu at openjdk.java.net Wed May 19 08:37:22 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 08:37:22 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 Message-ID: Hi all, Several vector tests fail with UseAVX=1 after JDK-8256973. The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. The fix just disables the intrinsics when UseAVX < 2. Testing: - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 ------------- Commit messages: - 8267370: [Vector API] Fix several crashes after JDK-8256973 Changes: https://git.openjdk.java.net/jdk/pull/4109/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4109&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267370 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4109/head:pull/4109 PR: https://git.openjdk.java.net/jdk/pull/4109 From jiefu at openjdk.java.net Wed May 19 08:46:40 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 08:46:40 GMT Subject: RFR: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 In-Reply-To: References: Message-ID: <_jD3PtGlJL7eljQuJk5V_Q-sZk3GMjXJiFIF30_T0N0=.d048b12f-b22c-4c04-9900-95fe7679fcc5@github.com> On Wed, 19 May 2021 08:20:13 GMT, Jatin Bhateja wrote: > Relevant declarations modified and tested with -Werror, no longer see unchecked conversion warnings. > > Kindly review and approve. LGTM ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4108 From neliasso at openjdk.java.net Wed May 19 08:46:43 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 08:46:43 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v11] In-Reply-To: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> References: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> Message-ID: On Tue, 18 May 2021 00:18:08 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). >> >> For this benchmark, the optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove scratch register from vpmulld Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3806 From aph at redhat.com Wed May 19 08:48:25 2021 From: aph at redhat.com (Andrew Haley) Date: Wed, 19 May 2021 09:48:25 +0100 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> Message-ID: On 5/17/21 6:51 PM, Paul Sandoz wrote: > I?ll let Sandhya talk more about the provenance and numerical accuracy. I think we can add more comments/details in that respect. > > IMO this is a reasonable compromise, at least for incubation with follow on investigation to determine if we can leverage possible enhancements to Panama FFM (see JEP 414 section on SVML). We would like encourage experimentation of numerical data-parallel algorithms. The performance gains using SVML are compelling in that regard. I understand the argument from utility here, but it's a very substantial precedent to take. In the licence we use for OpenJDK, the "source code" for a work means the preferred form of the work for making modifications to it. This is not the preferred form. These assembly-code files are not much more use than binary blobs would be. They are a black box. (I'm aware that we've had a few of these from Intel before, but ISTM that this is a much bigger deal.) I understand that permission to include these files was probably granted as a result of negotiations with Intel. And it's great to have this code in OpenJDK. However, I am sure that no-one on this project looks forward to a future in which part of our "source code" consists of what are in effect unfixable binary blobs. We should at least have the conversation about whether this is the way OpenJDK should be going. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jbhateja at openjdk.java.net Wed May 19 08:51:43 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 19 May 2021 08:51:43 GMT Subject: RFR: 8267364: Remove mask.incr which is introduced by JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 07:32:45 GMT, Jie Fu wrote: > Hi all, > > Please review the trivial change which removes the useless mask.incr. > > Thanks. > Best regards, > Jie Thanks Jie. ------------- PR: https://git.openjdk.java.net/jdk/pull/4107 From yyang at openjdk.java.net Wed May 19 08:55:44 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 08:55:44 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant [v3] In-Reply-To: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> References: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> Message-ID: On Wed, 19 May 2021 02:42:03 GMT, Yi Yang wrote: >> % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). >> >> So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: >> - x % -y ==> [0, y - 1] RCE >> - x % y ==> [0, y - 1] RCE >> - -x % y ==> [-y + 1, 0] >> - -x % -y ==> [-y + 1, 0] >> >> Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: >> >> >> for(int i=0;i<1000;i++){ >> int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant >> .... >> } >> >> >> For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. >> >> Testing: >> - test/hotspot/jtreg/compiler/c1/(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > missing whitespace; more comment Thank you Nils and Tobias for taking the time to review this patch! I've filed https://bugs.openjdk.java.net/browse/JDK-8267376 for investigating if it's possible to apply C2 RCE for % operator when divisor != 0. ------------- PR: https://git.openjdk.java.net/jdk/pull/4083 From yyang at openjdk.java.net Wed May 19 08:57:39 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 08:57:39 GMT Subject: RFR: 8267151: C2: Don't create dummy Opaque1Node for outmost unswitched IfNode [v2] In-Reply-To: <9lr_DC8R0J8wu-Z3Xb74vWCbxcjtUUkzarCeHDbnpz0=.eb9054c5-a14d-4844-98a6-af45ba8c5914@github.com> References: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> <9lr_DC8R0J8wu-Z3Xb74vWCbxcjtUUkzarCeHDbnpz0=.eb9054c5-a14d-4844-98a6-af45ba8c5914@github.com> Message-ID: On Wed, 19 May 2021 02:39:05 GMT, Yi Yang wrote: >> In create_slow_version_of_loop(), C2 creates the outmost unswitched IfNode(i.e. **if(xx)**{ for{} }else{ for{} }) with a dummy opaque bool node as its condition input. >> >> https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L265-L271 >> >> After that, it sets the _prob(missing _fcnt?) of the outmost unswitched IfNode in do_unswitching(). >> >> https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L186-L191 >> >> I think we can merge these two steps into a single step, that is, create the outmost unswitched IfNode meanwhile setting its condition input, _prob and _fcnt w/ creating the dummy opaque bool node. >> >> Testing: >> - hotspot/jtreg/compiler(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > unused head->is_CountedLoop() Thank you Nils and Tobias for taking the time to review this patch! ------------- PR: https://git.openjdk.java.net/jdk/pull/4079 From neliasso at openjdk.java.net Wed May 19 08:59:39 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 08:59:39 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 08:26:24 GMT, Jie Fu wrote: > Hi all, > > Several vector tests fail with UseAVX=1 after JDK-8256973. > The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. > The fix just disables the intrinsics when UseAVX < 2. > > Testing: > - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4109 From thartmann at openjdk.java.net Wed May 19 09:00:43 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 09:00:43 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v2] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 09:45:19 GMT, Yi Yang wrote: >> After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. >> >> There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() >> >> https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 >> >> We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. >> >> (This patch actually does two things: >> 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw >> 2. `Cleanup unused Unsafe{Get,Put}Raw code` >> They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) >> >> Thanks! >> Yang > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > unaliged_move for ppc/s390 Looks good to me but this should be tested on all platforms. src/hotspot/share/c1/c1_GraphBuilder.cpp line 3153: > 3151: Value off_val = append(new Constant(new IntConstant(offset))); > 3152: get = append(new UnsafeGetObject(as_BasicType(local->type()), e, > 3153: off_val, Indentation is wrong. src/hotspot/share/c1/c1_Instruction.hpp line 2318: > 2316: > 2317: // accessors > 2318: bool is_raw_get() { return _is_raw_get; } I would rename this to `_is_raw` because we already know it's a get. src/hotspot/share/c1/c1_LIRGenerator.cpp line 2108: > 2106: __ convert(Bytecodes::_i2l, off.result(), offset); > 2107: #else > 2108: LIR_Opr offset = off.result(); The indentation is wrong in the else branch. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3917 From neliasso at openjdk.java.net Wed May 19 09:02:41 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 09:02:41 GMT Subject: RFR: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 08:20:13 GMT, Jatin Bhateja wrote: > Relevant declarations modified and tested with -Werror, no longer see unchecked conversion warnings. > > Kindly review and approve. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4108 From yyang at openjdk.java.net Wed May 19 09:08:48 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 09:08:48 GMT Subject: Integrated: 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant In-Reply-To: References: Message-ID: On Tue, 18 May 2021 08:20:09 GMT, Yi Yang wrote: > % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). > > So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: > - x % -y ==> [0, y - 1] RCE > - x % y ==> [0, y - 1] RCE > - -x % y ==> [-y + 1, 0] > - -x % -y ==> [-y + 1, 0] > > Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: > > > for(int i=0;i<1000;i++){ > int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant > .... > } > > > For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. > > Testing: > - test/hotspot/jtreg/compiler/c1/(slowdebug) This pull request has now been integrated. Changeset: 0cf7e578 Author: Yi Yang Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/0cf7e5784b4ddb70c8674a814527d3e0c315a1ec Stats: 109 lines in 3 files changed: 99 ins; 10 del; 0 mod 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant Reviewed-by: thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/4083 From yyang at openjdk.java.net Wed May 19 09:10:42 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 19 May 2021 09:10:42 GMT Subject: Integrated: 8267151: C2: Don't create dummy Opaque1Node for outmost unswitched IfNode In-Reply-To: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> References: <52F-48LXl0cjM3C9rcAYKp_VdOsUMVetxr5ydepHB2Y=.d361fb09-0b4e-4a4e-b5d2-f44254fb8816@github.com> Message-ID: On Tue, 18 May 2021 03:12:23 GMT, Yi Yang wrote: > In create_slow_version_of_loop(), C2 creates the outmost unswitched IfNode(i.e. **if(xx)**{ for{} }else{ for{} }) with a dummy opaque bool node as its condition input. > > https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L265-L271 > > After that, it sets the _prob(missing _fcnt?) of the outmost unswitched IfNode in do_unswitching(). > > https://github.com/openjdk/jdk/blob/cd1c17c0a6416a8d16cf2035f3e97dba95b6b8af/src/hotspot/share/opto/loopUnswitch.cpp#L186-L191 > > I think we can merge these two steps into a single step, that is, create the outmost unswitched IfNode meanwhile setting its condition input, _prob and _fcnt w/ creating the dummy opaque bool node. > > Testing: > - hotspot/jtreg/compiler(slowdebug) This pull request has now been integrated. Changeset: 392f962e Author: Yi Yang Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/392f962e0e87de1b5183505c86a967cc9999e04c Stats: 30 lines in 2 files changed: 3 ins; 16 del; 11 mod 8267151: C2: Don't create dummy Opaque1Node for outmost unswitched IfNode Reviewed-by: thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/4079 From neliasso at openjdk.java.net Wed May 19 09:11:17 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 09:11:17 GMT Subject: RFR: 8265262: CITime - 'other' incorrectly calculated [v2] In-Reply-To: References: Message-ID: > This CR fixes a few issues with the CITIme output for C2: > > 1) The other category for _t_optimize is not removing time spent in _t_vector > > 2) Some of the _t_incrInline sub counters is called from different contexts - calculating 'other' from total time spent in _t_incrInline expects that the counter usage is strictly hierarchical. > > 3) I've placed the non-hierarchical counters in braces. > > 4) Code Installation is a part of Code Emission (_t_output). Indentation fixed. > > 5) Moved "renumber live" after "Vector" so that they appear in order. > > 6) Added sub counters "shorten branches" and "fill buffer" to "Code Emission" phase, and added an other category. Before more than 50% of time in Code Emission was unaccounted for, now it's less than 25%. > > Please review, > Best regards, > Nils Eliasson Nils Eliasson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into fix_citime - removed whitespace - Add timers - fix_counters ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4065/files - new: https://git.openjdk.java.net/jdk/pull/4065/files/c7ccc63a..50245c2c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4065&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4065&range=00-01 Stats: 4519 lines in 213 files changed: 3329 ins; 736 del; 454 mod Patch: https://git.openjdk.java.net/jdk/pull/4065.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4065/head:pull/4065 PR: https://git.openjdk.java.net/jdk/pull/4065 From mgronlun at openjdk.java.net Wed May 19 09:27:53 2021 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 19 May 2021 09:27:53 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 03:07:36 GMT, Denghui Dong wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> fix crash problem > > Hi Vladimir, > > Thanks for your comment. > > Yes, the native implementation for `getClassIdNonIntrinsic`/`getClassId` is located in `jfrTraceId.cpp#L178` just as you said, more specifically, there are two path, one(JfrTraceId::load) for normal class and one(load_primitive) for primitive class (includeing void.class). > > My pseudo-code(the comment of `LibraryCallKit::inline_native_classID`) is consistent with the implementation of these two paths. > > And in the normal class implementation path, there are fast path and slow path(see JfrTraceIdLoadBarrier::load), only some comparison and shift operations are needed to obtain the class ID in the fast path, and that's where I think intrinsic can bring performance improvements, I saw about 20x improvement from my microbenchmark. > > Judging from the current JFR implementation, there are already some events that need to rely on this API, such as `ExceptionThrownEvent` and `ErrorThrownEvent` use `thrownClass` to record the type of exception, and I also noticed that there is a new PR(https://github.com/openjdk/jdk/pull/4101) to add `FinalizerEvent` which include a field named `finalizedClass` to record the type information. Therefore, I have reason to believe that this API will be frequently used during the JFR activation process. > > As far as the current implementation is concerned, it is indeed a bit complicated, I think some simplifications can be made, for example, only the fast path for the normal class is retained, and other paths are directly implemented by calling the native function. What do you think? > > @egahlin @mgronlun > And I hope JFR's folks could give some suggestions on this PR:) > > Best, > Denghui Hi @D-D-H, sorry for the late reply. I am currently a bit busy but hope to get around taking a look at this soon. Thanks Markus ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From jbhateja at openjdk.java.net Wed May 19 09:48:46 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 19 May 2021 09:48:46 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 In-Reply-To: References: Message-ID: <3BOYC9OLXPEbvzpFtYjXk0qST2xiawoxrpDss8Iyra4=.9c4948ef-1063-4888-bfb9-7fbc43e2c0f0@github.com> On Wed, 19 May 2021 08:26:24 GMT, Jie Fu wrote: > Hi all, > > Several vector tests fail with UseAVX=1 after JDK-8256973. > The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. > The fix just disables the intrinsics when UseAVX < 2. > > Testing: > - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 Hi @DamonFool , Thanks much for providing a quick fix. Problem here seems to be related to assert in following assembler routines which made an assumption that the instruction is only supported over AVX2 platforms. >void Assembler::vpmovmskb(Register dst, XMMRegister src) { > assert(VM_Version::supports_avx2(), ""); VEX.128.66.0F.WIG D7 /r VPMOVMSKB reg, xmm1 | RM | V/V | AVX | Move a byte mask of xmm1 to reg. The upper bits of r32 or r64 are filled with zeros. -- | -- | -- | -- | -- But, its ok to limit the patch to AVX2 considering optimization is majorly aimed at modern server targets (which should supported AVX2). An alternate fix is proposed below which does not restrict the optimization for AVX2 and still keep the changes minimal, since movmaskb is already used at several places in context of 256 bit vector argument. diff --git a/src/hotspot/cpu/x86/assembler_x86.cpp b/src/hotspot/cpu/x86/assembler_x86.cpp index d915c846d09..c695da12d42 100644 --- a/src/hotspot/cpu/x86/assembler_x86.cpp +++ b/src/hotspot/cpu/x86/assembler_x86.cpp @@ -4123,9 +4123,10 @@ void Assembler::pmovmskb(Register dst, XMMRegister src) { emit_int16((unsigned char)0xD7, (0xC0 | encode)); } -void Assembler::vpmovmskb(Register dst, XMMRegister src) { - assert(VM_Version::supports_avx2(), ""); - InstructionAttr attributes(AVX_256bit, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); +void Assembler::vpmovmskb(Register dst, XMMRegister src, int vec_enc) { + assert((VM_Version::supports_avx() && vec_enc == AVX_128bit) || + (VM_Version::supports_avx2() && vec_enc == AVX_256bit), ""); + InstructionAttr attributes(vec_enc, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int16((unsigned char)0xD7, (0xC0 | encode)); } diff --git a/src/hotspot/cpu/x86/assembler_x86.hpp b/src/hotspot/cpu/x86/assembler_x86.hpp index 8526785eea1..a02fdf27582 100644 --- a/src/hotspot/cpu/x86/assembler_x86.hpp +++ b/src/hotspot/cpu/x86/assembler_x86.hpp @@ -1746,7 +1746,7 @@ private: void vpcmpgtq(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len); void pmovmskb(Register dst, XMMRegister src); - void vpmovmskb(Register dst, XMMRegister src); + void vpmovmskb(Register dst, XMMRegister src, int vec_enc); // SSE 4.1 extract void pextrd(Register dst, XMMRegister src, int imm8); diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp index 01212d790fb..310bbbfa150 100644 --- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp @@ -3782,7 +3782,7 @@ void C2_MacroAssembler::vector_mask_operation(int opc, Register dst, XMMRegister assert(VM_Version::supports_avx(), ""); vpxor(xtmp, xtmp, xtmp, vec_enc); vpsubb(xtmp, xtmp, mask, vec_enc); - vpmovmskb(tmp, xtmp); + vpmovmskb(tmp, xtmp, vec_enc); switch(opc) { case Op_VectorMaskTrueCount: popcntq(dst, tmp); diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp index 7c37899b456..3e95aa64a41 100644 --- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp @@ -3216,9 +3216,9 @@ void MacroAssembler::vpmovzxbw(XMMRegister dst, Address src, int vector_len) { Assembler::vpmovzxbw(dst, src, vector_len); } -void MacroAssembler::vpmovmskb(Register dst, XMMRegister src) { +void MacroAssembler::vpmovmskb(Register dst, XMMRegister src, int vector_len) { assert((src->encoding() < 16),"XMM register should be 0-15"); - Assembler::vpmovmskb(dst, src); + Assembler::vpmovmskb(dst, src, vector_len); } void MacroAssembler::vpmullw(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.hpp b/src/hotspot/cpu/x86/macroAssembler_x86.hpp index 074d4a61601..64f4d6e157b 100644 --- a/src/hotspot/cpu/x86/macroAssembler_x86.hpp +++ b/src/hotspot/cpu/x86/macroAssembler_x86.hpp @@ -1303,7 +1303,7 @@ public: void vpmovzxbw(XMMRegister dst, Address src, int vector_len); void vpmovzxbw(XMMRegister dst, XMMRegister src, int vector_len) { Assembler::vpmovzxbw(dst, src, vector_len); } - void vpmovmskb(Register dst, XMMRegister src); + void vpmovmskb(Register dst, XMMRegister src, int vector_len = Assembler::AVX_256bit); void vpmullw(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len); void vpmullw(XMMRegister dst, XMMRegister nds, Address src, int vector_len); ------------- PR: https://git.openjdk.java.net/jdk/pull/4109 From thartmann at openjdk.java.net Wed May 19 09:58:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 19 May 2021 09:58:49 GMT Subject: RFR: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 08:20:13 GMT, Jatin Bhateja wrote: > Relevant declarations modified and tested with -Werror, no longer see unchecked conversion warnings. > > Kindly review and approve. Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4108 From jbhateja at openjdk.java.net Wed May 19 10:01:45 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 19 May 2021 10:01:45 GMT Subject: Integrated: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 In-Reply-To: References: Message-ID: On Wed, 19 May 2021 08:20:13 GMT, Jatin Bhateja wrote: > Relevant declarations modified and tested with -Werror, no longer see unchecked conversion warnings. > > Kindly review and approve. This pull request has now been integrated. Changeset: 88b11423 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/88b114235c5716ea43c55a9c4bc886bf5bcf4b42 Stats: 10 lines in 1 file changed: 0 ins; 1 del; 9 mod 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 Reviewed-by: jiefu, neliasso, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4108 From jiefu at openjdk.java.net Wed May 19 11:23:05 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 11:23:05 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 [v2] In-Reply-To: References: Message-ID: > Hi all, > > Several vector tests fail with UseAVX=1 after JDK-8256973. > The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. > The fix just disables the intrinsics when UseAVX < 2. > > Testing: > - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Enable vpmovmskb for 128-bit vectors ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4109/files - new: https://git.openjdk.java.net/jdk/pull/4109/files/319fe057..efc23727 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4109&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4109&range=00-01 Stats: 11 lines in 6 files changed: 1 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/4109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4109/head:pull/4109 PR: https://git.openjdk.java.net/jdk/pull/4109 From jiefu at openjdk.java.net Wed May 19 11:23:05 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 11:23:05 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 In-Reply-To: <3BOYC9OLXPEbvzpFtYjXk0qST2xiawoxrpDss8Iyra4=.9c4948ef-1063-4888-bfb9-7fbc43e2c0f0@github.com> References: <3BOYC9OLXPEbvzpFtYjXk0qST2xiawoxrpDss8Iyra4=.9c4948ef-1063-4888-bfb9-7fbc43e2c0f0@github.com> Message-ID: On Wed, 19 May 2021 09:46:16 GMT, Jatin Bhateja wrote: > Thanks much for providing a quick fix. Problem here seems to be related to assert in following assembler routines which made an assumption that the instruction is only supported over AVX2 platforms. Good catch! @jatin-bhateja It makes sense to enable 128-bit vectors for vpmovmskb. Updated. More testing is in progress. Will let you know once finished. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4109 From jiefu at openjdk.java.net Wed May 19 11:56:40 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 19 May 2021 11:56:40 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 [v2] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 11:23:05 GMT, Jie Fu wrote: >> Hi all, >> >> Several vector tests fail with UseAVX=1 after JDK-8256973. >> The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. >> The fix just disables the intrinsics when UseAVX < 2. >> >> Testing: >> - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Enable vpmovmskb for 128-bit vectors The vector tests passed on our Linux/x64 machine. @neliasso , please let me know if you're also fine with the updated change. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4109 From hseigel at openjdk.java.net Wed May 19 13:01:40 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 19 May 2021 13:01:40 GMT Subject: RFR: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 [v3] In-Reply-To: References: Message-ID: <3r2Tj2E-BhF6LmijLjm5qGUpYjDxyWjYuKSbZ79rXOg=.6383c0e4-adc2-4640-8ef1-c48fec0c1c7e@github.com> On Wed, 19 May 2021 08:02:08 GMT, Doug Simon wrote: >> This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed failing test LGTM Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4099 From dnsimon at openjdk.java.net Wed May 19 14:03:41 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 19 May 2021 14:03:41 GMT Subject: Integrated: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 In-Reply-To: References: Message-ID: <1M98uNJvRvZBhL2hj8n0LyihlEU1qipyTWnieXz8UpQ=.fe2f8972-28ba-4847-aeab-5bd80971f27a@github.com> On Tue, 18 May 2021 19:01:38 GMT, Doug Simon wrote: > This PR revives ResolvedJavaType.getHostClass to preserve JVMCI compatibility. The revived method just returns `null`. This pull request has now been integrated. Changeset: fdd03528 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/fdd0352884cdbba8a9cd11c6f92f0c2fbd800e11 Stats: 10 lines in 2 files changed: 10 ins; 0 del; 0 mod 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287 Reviewed-by: mchung, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/4099 From duke at openjdk.java.net Wed May 19 15:14:00 2021 From: duke at openjdk.java.net (duke) Date: Wed, 19 May 2021 15:14:00 GMT Subject: Withdrawn: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 09:22:00 GMT, Xin Liu wrote: > There are 3 nodes involving in the construction of a java.lang.String object. > 1. Allocate of itself, aka. alloc > 2. AllocateArray of a byte array, which is value:byte[], aka. aa > 3. ArrayCopyNode which copys in the contents of value, aka. ac > > Lemma > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. > > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. > > It is possible to rewire `aa` to the source of ac with the correct offset. That is to say, we can replace both `aa` and `ac` with a ?shallow copy? of the source of `ac`. It?s safe if C2 keeps a reference of the source oop for all safepoints. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From kvn at openjdk.java.net Wed May 19 15:31:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 15:31:43 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 03:07:36 GMT, Denghui Dong wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> fix crash problem > > Hi Vladimir, > > Thanks for your comment. > > Yes, the native implementation for `getClassIdNonIntrinsic`/`getClassId` is located in `jfrTraceId.cpp#L178` just as you said, more specifically, there are two path, one(JfrTraceId::load) for normal class and one(load_primitive) for primitive class (includeing void.class). > > My pseudo-code(the comment of `LibraryCallKit::inline_native_classID`) is consistent with the implementation of these two paths. > > And in the normal class implementation path, there are fast path and slow path(see JfrTraceIdLoadBarrier::load), only some comparison and shift operations are needed to obtain the class ID in the fast path, and that's where I think intrinsic can bring performance improvements, I saw about 20x improvement from my microbenchmark. > > Judging from the current JFR implementation, there are already some events that need to rely on this API, such as `ExceptionThrownEvent` and `ErrorThrownEvent` use `thrownClass` to record the type of exception, and I also noticed that there is a new PR(https://github.com/openjdk/jdk/pull/4101) to add `FinalizerEvent` which include a field named `finalizedClass` to record the type information. Therefore, I have reason to believe that this API will be frequently used during the JFR activation process. > > As far as the current implementation is concerned, it is indeed a bit complicated, I think some simplifications can be made, for example, only the fast path for the normal class is retained, and other paths are directly implemented by calling the native function. What do you think? > > @egahlin @mgronlun > And I hope JFR's folks could give some suggestions on this PR:) > > Best, > Denghui Hi @D-D-H, Can you show pseudo code for your fast/slow path suggestion? What is most frequent (performance critical) path is used? If it simplify code I am for it. I am not sure about implementing it in C1 since in default mode methods will be compiled by C2. So for C1 to call native implementation could be fine and we can remove code there. Or you can file followup RFE to fix it in C1 later. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From sviswanathan at openjdk.java.net Wed May 19 16:18:47 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 16:18:47 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v9] In-Reply-To: References: <4KqXJjwOAZfYQGe-guwozZskn8lf2RR7oTBu5aHSUQo=.ca01baa1-601b-4cab-b546-134173ce4ce9@github.com> Message-ID: On Mon, 17 May 2021 19:13:06 GMT, Vladimir Kozlov wrote: >> I'm not a lawyer, but Pengfei, please contribute this benchmark. All you have to do is copy it into cr.openjdk.java.net. That should be enough for someone else to take it from there. And AFAICR files should have a copyright header, which you should do too. > >> I'm not a lawyer, but Pengfei, please contribute this benchmark. All you have to do is copy it into cr.openjdk.java.net. That should be enough for someone else to take it from there. And AFAICR files should have a copyright header, which you should do too. > > @theRealAph micro is already there for long time: https://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java > It missed copyright header which is added in these changes. @vnkozlov Looks like automated tests are not run. Could you please help? ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From kvn at openjdk.java.net Wed May 19 16:50:51 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 16:50:51 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v11] In-Reply-To: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> References: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> Message-ID: On Tue, 18 May 2021 00:18:08 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). >> >> For this benchmark, the optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove scratch register from vpmulld I started our internal testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From psandoz at openjdk.java.net Wed May 19 16:54:39 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 19 May 2021 16:54:39 GMT Subject: RFR: 8267190: Optimize Vector API test operations In-Reply-To: References: Message-ID: On Fri, 14 May 2021 23:58:38 GMT, Sandhya Viswanathan wrote: > Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: > 1) reinterpreting the floating point vectors as integral vectors (int/long) > 2) perform the test in integer domain to get a int/long mask > 3) reinterpret the int/long mask as float/double mask > Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. > > For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: > > Base: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms > > With patch: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms > > Best Regards, > Sandhya Tier 1 to 3 tests pass on supported platforms ------------- PR: https://git.openjdk.java.net/jdk/pull/4039 From neliasso at openjdk.java.net Wed May 19 17:53:42 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 19 May 2021 17:53:42 GMT Subject: Integrated: 8265262: CITime - 'other' incorrectly calculated In-Reply-To: References: Message-ID: <9L60Vp7msxX9SACZA4-4xJug7pzZE_GDovipc4lNvs4=.94179f16-a938-4a1a-b277-cd58138392cc@github.com> On Mon, 17 May 2021 16:36:26 GMT, Nils Eliasson wrote: > This CR fixes a few issues with the CITIme output for C2: > > 1) The other category for _t_optimize is not removing time spent in _t_vector > > 2) Some of the _t_incrInline sub counters is called from different contexts - calculating 'other' from total time spent in _t_incrInline expects that the counter usage is strictly hierarchical. > > 3) I've placed the non-hierarchical counters in braces. > > 4) Code Installation is a part of Code Emission (_t_output). Indentation fixed. > > 5) Moved "renumber live" after "Vector" so that they appear in order. > > 6) Added sub counters "shorten branches" and "fill buffer" to "Code Emission" phase, and added an other category. Before more than 50% of time in Code Emission was unaccounted for, now it's less than 25%. > > Please review, > Best regards, > Nils Eliasson This pull request has now been integrated. Changeset: 38d690b3 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/38d690b3c347f71b41a34b36c1a232ea766b9a64 Stats: 36 lines in 3 files changed: 24 ins; 3 del; 9 mod 8265262: CITime - 'other' incorrectly calculated Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/4065 From psandoz at openjdk.java.net Wed May 19 19:12:41 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 19 May 2021 19:12:41 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: <0tu07AIsySXIfRfCNMP-tfLYSooESJWGjSuzqcdQw3E=.d15a6b52-cd03-4efd-ba07-cd034fbf1411@github.com> On Tue, 18 May 2021 05:21:06 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. Java benchmark code looks good. Needs HS reviewers. Tier 1 to 3 tests pass on supported platforms. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3999 From psandoz at openjdk.java.net Wed May 19 19:26:44 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 19 May 2021 19:26:44 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 03:37:11 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > fix 32-bit build src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 723: > 721: #end[BITWISE] > 722: #if[FP] > 723: case VECTOR_OP_OR: return (v0, v1) -> `VECTOR_OP_OR` looks incorrect, since `VectorOperators.OR` is not applicable to FP types. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Wed May 19 21:55:07 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 21:55:07 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v10] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement Vladimir Ivanov and Paul Sandoz review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/f7e39913..0b4a1c9c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=08-09 Stats: 45 lines in 1 file changed: 0 ins; 45 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Wed May 19 21:58:37 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 21:58:37 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. Paul and Vladimir, I have implemented your review comments. Please take a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From psandoz at openjdk.java.net Wed May 19 22:05:59 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 19 May 2021 22:05:59 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2] In-Reply-To: References: Message-ID: On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. > Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. > Paul and Vladimir, I have implemented your review comments. Please take a look. `case VECTOR_OP_OR` is still present. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Wed May 19 22:16:18 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 22:16:18 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Commit missing changes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/0b4a1c9c..1b0367ac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=09-10 Stats: 55 lines in 16 files changed: 2 ins; 42 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Wed May 19 22:16:19 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 22:16:19 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2] In-Reply-To: References: Message-ID: <5ncMaHLhmfOv3FBOrYSkKmzVXHwQqFks-RPcjuC02Mo=.31de0845-e60a-413a-8984-0ce6e4eac2ed@github.com> On Wed, 19 May 2021 22:02:14 GMT, Paul Sandoz wrote: >> Tier 1 to 3 tests pass for the default set of build profiles. > >> Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. >> Paul and Vladimir, I have implemented your review comments. Please take a look. > > `case VECTOR_OP_OR` is still present. @PaulSandoz Thanks for pointing that out. I had missed git add for some of the files. ------------- PR: https://git.openjdk.java.net/jdk/pull/3638 From psandoz at openjdk.java.net Wed May 19 22:29:32 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 19 May 2021 22:29:32 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 22:16:18 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. >> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. >> >> The following changes are made: >> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. >> The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. >> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. >> Changes are made to build system to support dependency tracking for assembly files with includes. >> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. >> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 >> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 >> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 >> Float128Vector.COS 42.82 803.02 ops/ms 18.75 >> Float128Vector.COSH 31.44 118.34 ops/ms 3.76 >> Float128Vector.EXP 72.43 855.33 ops/ms 11.81 >> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 >> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 >> Float128Vector.LOG 52.95 877.94 ops/ms 16.58 >> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 >> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 >> Float128Vector.SIN 43.38 745.31 ops/ms 17.18 >> Float128Vector.SINH 31.11 112.91 ops/ms 3.63 >> Float128Vector.TAN 37.25 332.13 ops/ms 8.92 >> Float128Vector.TANH 57.63 453.77 ops/ms 7.87 >> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 >> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 >> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 >> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 >> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 >> Float256Vector.COS 43.75 926.69 ops/ms 21.18 >> Float256Vector.COSH 33.52 130.46 ops/ms 3.89 >> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 >> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 >> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 >> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 >> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 >> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 >> Float256Vector.SIN 44.07 911.04 ops/ms 20.67 >> Float256Vector.SINH 33.16 122.50 ops/ms 3.69 >> Float256Vector.TAN 37.85 497.75 ops/ms 13.15 >> Float256Vector.TANH 64.27 537.20 ops/ms 8.36 >> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 >> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 >> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 >> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 >> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 >> Float512Vector.COS 40.92 1567.93 ops/ms 38.32 >> Float512Vector.COSH 33.42 138.36 ops/ms 4.14 >> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 >> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 >> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 >> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 >> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 >> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 >> Float512Vector.POW 32.73 1015.85 ops/ms 31.04 >> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 >> Float512Vector.SINH 33.05 129.39 ops/ms 3.91 >> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 >> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 >> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 >> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 >> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 >> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 >> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 >> Float64Vector.COS 44.28 394.33 ops/ms 8.91 >> Float64Vector.COSH 28.35 95.27 ops/ms 3.36 >> Float64Vector.EXP 65.80 486.37 ops/ms 7.39 >> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 >> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 >> Float64Vector.LOG 51.93 163.25 ops/ms 3.14 >> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 >> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 >> Float64Vector.SIN 44.41 382.09 ops/ms 8.60 >> Float64Vector.SINH 28.20 90.68 ops/ms 3.22 >> Float64Vector.TAN 36.29 160.89 ops/ms 4.43 >> Float64Vector.TANH 47.65 214.04 ops/ms 4.49 > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Commit missing changes > @PaulSandoz Thanks for pointing that out. I had missed git add for some of the files. Java changes look good. Please don't integrate when checks pass. I need to work through some JEP details first before we can integrate relevant PRs. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Wed May 19 22:52:09 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 22:52:09 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v12] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge master - Commit missing changes - Implement Vladimir Ivanov and Paul Sandoz review comments - fix 32-bit build - Add comments explaining naming convention - jcheck fixes - Print intrinsic fix - Implement review comments - Add missing Lib.gmk - Merge master - ... and 6 more: https://git.openjdk.java.net/jdk/compare/b961f253...7b959b67 ------------- Changes: https://git.openjdk.java.net/jdk/pull/3638/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=11 Stats: 416021 lines in 119 files changed: 415854 ins; 124 del; 43 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Wed May 19 23:01:09 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 19 May 2021 23:01:09 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v13] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: correct ppc.ad ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/7b959b67..4d59af0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=11-12 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From paul.sandoz at oracle.com Wed May 19 23:34:45 2021 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 19 May 2021 23:34:45 +0000 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> Message-ID: <6AA63D23-E2C1-4510-8FCC-2B17FFF3465E@oracle.com> I share your concerns about code maintainability. I have every confidence that Intel?s contributors to OpenJDK are prepared to maintain the code and provide fixes. In this case I would argue that what is being proposed here is very unique: under incubation for highly specialized implementations of numerical operations from a well regarded library, while exploring alternative bindings during further incubation with enhancements to Panama. I don?t think this should be considered a generally acceptable approach for Vector API operations (most code for operations does not and should not follow this approach), nor is it generally acceptable for other kinds of intrinsic in HotSpot (I believe there are a few special cases under os_cpu). Thus we should dissuade the use of .S source for other intrinsic cases. Does this help alleviate some of your concerns? Paul. > On May 19, 2021, at 1:48 AM, Andrew Haley wrote: > > On 5/17/21 6:51 PM, Paul Sandoz wrote: >> I?ll let Sandhya talk more about the provenance and numerical accuracy. I think we can add more comments/details in that respect. >> >> IMO this is a reasonable compromise, at least for incubation with follow on investigation to determine if we can leverage possible enhancements to Panama FFM (see JEP 414 section on SVML). We would like encourage experimentation of numerical data-parallel algorithms. The performance gains using SVML are compelling in that regard. > > I understand the argument from utility here, but it's a very substantial > precedent to take. In the licence we use for OpenJDK, the "source code" for > a work means the preferred form of the work for making modifications to it. > This is not the preferred form. These assembly-code files are not much more > use than binary blobs would be. They are a black box. > > (I'm aware that we've had a few of these from Intel before, but ISTM that > this is a much bigger deal.) > > I understand that permission to include these files was probably granted as > a result of negotiations with Intel. And it's great to have this code in > OpenJDK. > > However, I am sure that no-one on this project looks forward to a future > in which part of our "source code" consists of what are in effect unfixable > binary blobs. We should at least have the conversation about whether this > is the way OpenJDK should be going. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://urldefense.com/v3/__https://keybase.io/andrewhaley__;!!GqivPVa7Brio!Muzl015fdf6qlmfWYY3Lr9llw8tGfFwoTzRPYg7wbpeoAOajiDTgxVdS5luVfUunGg$ > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From kvn at openjdk.java.net Wed May 19 23:47:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 19 May 2021 23:47:45 GMT Subject: RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v11] In-Reply-To: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> References: <0mKzVE9RTWU0ZxjILDLkFx6EW-skdsp3lshNPbucuik=.4e19c3b0-7a3a-47f0-a008-21d999f24c15@github.com> Message-ID: On Tue, 18 May 2021 00:18:08 GMT, Xubo Zhang wrote: >> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. >> >> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). >> >> For this benchmark, the optimization shows ~5x improvement. >> >> Base: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op >> >> >> With patch: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op >> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op >> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op >> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op >> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op >> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op >> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > remove scratch register from vpmulld tier1-3 testing is clean. It ran compiler/intrinsics/zip/TestAdler32.java test. ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From github.com+58006833+xbzhang99 at openjdk.java.net Wed May 19 23:47:49 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 19 May 2021 23:47:49 GMT Subject: Integrated: 8266332: Adler32 intrinsic for x86 64-bit platforms In-Reply-To: References: Message-ID: On Thu, 29 Apr 2021 23:47:17 GMT, Xubo Zhang wrote: > Implement Adler32 intrinsic for x86 64-bit platform using vector instructions. > > The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com). > > For this benchmark, the optimization shows ~5x improvement. > > Base: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ? 0.005 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ? 0.007 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ? 0.014 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ? 0.023 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ? 0.077 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ? 0.160 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ? 0.319 us/op > > > With patch: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ? 0.001 us/op > TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ? 0.002 us/op > TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ? 0.004 us/op > TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ? 0.010 us/op > TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ? 0.022 us/op > TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ? 0.052 us/op > TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ? 0.013 us/op This pull request has now been integrated. Changeset: 8e3549fc Author: Xubo Zhang Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/8e3549fc736539a45534dfe2b417170b5c991c7d Stats: 399 lines in 13 files changed: 393 ins; 5 del; 1 mod 8266332: Adler32 intrinsic for x86 64-bit platforms Co-authored-by: Xubo Zhang Co-authored-by: Greg B Tucker Co-authored-by: Pengfei Li Reviewed-by: sviswanathan, jbhateja, kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/3806 From sandhya.viswanathan at intel.com Wed May 19 23:51:14 2021 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 19 May 2021 23:51:14 +0000 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> Message-ID: Hi Andrew, Sorry for delay in response. The code contributed here is from Intel SVML which is shipped with Intel C compiler. The origin of SVML goes back to early 2000s and is well tested. The routines contributed to OpenJDK are high accuracy (within 1ulp) routines. The library is written using Intel C Compiler extensions. The generated code is the only way we could bring it in. Our collective goal is to bring SIMD programming to Java application writers. Due to lack of SIMD programming capability in Java, compute intensive applications originally written in Java are looking to move to native or other languages in future. As part of Vector API incubation, we want to give a platform to application writers where they can experiment writing their data-parallel algorithms in Java itself. We hope that bringing generated assembly like this is one off as you mention and not something to become a practice. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Andrew Haley Sent: Wednesday, May 19, 2021 1:48 AM To: Paul Sandoz Cc: hotspot compiler Subject: Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics On 5/17/21 6:51 PM, Paul Sandoz wrote: > I?ll let Sandhya talk more about the provenance and numerical accuracy. I think we can add more comments/details in that respect. > > IMO this is a reasonable compromise, at least for incubation with follow on investigation to determine if we can leverage possible enhancements to Panama FFM (see JEP 414 section on SVML). We would like encourage experimentation of numerical data-parallel algorithms. The performance gains using SVML are compelling in that regard. I understand the argument from utility here, but it's a very substantial precedent to take. In the licence we use for OpenJDK, the "source code" for a work means the preferred form of the work for making modifications to it. This is not the preferred form. These assembly-code files are not much more use than binary blobs would be. They are a black box. (I'm aware that we've had a few of these from Intel before, but ISTM that this is a much bigger deal.) I understand that permission to include these files was probably granted as a result of negotiations with Intel. And it's great to have this code in OpenJDK. However, I am sure that no-one on this project looks forward to a future in which part of our "source code" consists of what are in effect unfixable binary blobs. We should at least have the conversation about whether this is the way OpenJDK should be going. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sviswanathan at openjdk.java.net Thu May 20 01:27:48 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 20 May 2021 01:27:48 GMT Subject: RFR: 8267190: Optimize Vector API test operations [v2] In-Reply-To: References: Message-ID: > Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: > 1) reinterpreting the floating point vectors as integral vectors (int/long) > 2) perform the test in integer domain to get a int/long mask > 3) reinterpret the int/long mask as float/double mask > Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. > > For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: > > Base: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms > > With patch: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement Paul's review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4039/files - new: https://git.openjdk.java.net/jdk/pull/4039/files/bb0d4000..b506fc45 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4039&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4039&range=00-01 Stats: 806 lines in 31 files changed: 0 ins; 310 del; 496 mod Patch: https://git.openjdk.java.net/jdk/pull/4039.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4039/head:pull/4039 PR: https://git.openjdk.java.net/jdk/pull/4039 From sviswanathan at openjdk.java.net Thu May 20 01:27:49 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 20 May 2021 01:27:49 GMT Subject: RFR: 8267190: Optimize Vector API test operations In-Reply-To: References: Message-ID: On Wed, 19 May 2021 16:51:33 GMT, Paul Sandoz wrote: >> Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: >> 1) reinterpreting the floating point vectors as integral vectors (int/long) >> 2) perform the test in integer domain to get a int/long mask >> 3) reinterpret the int/long mask as float/double mask >> Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. >> >> For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: >> >> Base: >> Benchmark (size) Mode Cnt Score Error Units >> VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms >> VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms >> VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms >> VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms >> VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms >> >> With patch: >> Benchmark (size) Mode Cnt Score Error Units >> VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms >> VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms >> VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms >> VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms >> VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms >> >> Best Regards, >> Sandhya > > Tier 1 to 3 tests pass on supported platforms @PaulSandoz @vnkozlov Thanks a lot for the review. Paul, I have implemented your review comments. I also changed the switch to switch expression. Please take a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/4039 From yyang at openjdk.java.net Thu May 20 02:44:10 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 20 May 2021 02:44:10 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v3] In-Reply-To: References: Message-ID: <8bbfrQRWAy3o4XRgMjcHZ6Cp7x-3vWmdq1M3x3T-mvE=.6a933a21-2fd4-4677-b162-a39e7b64084d@github.com> > After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. > > There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() > > https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 > > We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. > > (This patch actually does two things: > 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw > 2. `Cleanup unused Unsafe{Get,Put}Raw code` > They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) > > Thanks! > Yang Yi Yang has updated the pull request incrementally with one additional commit since the last revision: many nit ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3917/files - new: https://git.openjdk.java.net/jdk/pull/3917/files/8c239e45..be6b9891 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3917&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3917&range=01-02 Stats: 14 lines in 3 files changed: 2 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/3917.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3917/head:pull/3917 PR: https://git.openjdk.java.net/jdk/pull/3917 From yyang at openjdk.java.net Thu May 20 02:44:13 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 20 May 2021 02:44:13 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v2] In-Reply-To: References: Message-ID: <6GfJNOOeMRnkw7Zfk_btAcgtFuIes7LLc2IXiY16bV8=.f5f83668-4af6-450c-a8c8-fa105a08b12b@github.com> On Wed, 19 May 2021 08:47:12 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> unaliged_move for ppc/s390 > > src/hotspot/share/c1/c1_Instruction.hpp line 2318: > >> 2316: >> 2317: // accessors >> 2318: bool is_raw_get() { return _is_raw_get; } > > I would rename this to `_is_raw` because we already know it's a get. Thanks Tobias for the review! All fixed. I will test it on Linux(already tested)/Mac/Windows(aarch64+x86_64) later. But I don't have ppc and s390 machines so I'm not sure how to test it on them... ------------- PR: https://git.openjdk.java.net/jdk/pull/3917 From ddong at openjdk.java.net Thu May 20 03:44:36 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Thu, 20 May 2021 03:44:36 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 15:28:23 GMT, Vladimir Kozlov wrote: >> Hi Vladimir, >> >> Thanks for your comment. >> >> Yes, the native implementation for `getClassIdNonIntrinsic`/`getClassId` is located in `jfrTraceId.cpp#L178` just as you said, more specifically, there are two path, one(JfrTraceId::load) for normal class and one(load_primitive) for primitive class (includeing void.class). >> >> My pseudo-code(the comment of `LibraryCallKit::inline_native_classID`) is consistent with the implementation of these two paths. >> >> And in the normal class implementation path, there are fast path and slow path(see JfrTraceIdLoadBarrier::load), only some comparison and shift operations are needed to obtain the class ID in the fast path, and that's where I think intrinsic can bring performance improvements, I saw about 20x improvement from my microbenchmark. >> >> Judging from the current JFR implementation, there are already some events that need to rely on this API, such as `ExceptionThrownEvent` and `ErrorThrownEvent` use `thrownClass` to record the type of exception, and I also noticed that there is a new PR(https://github.com/openjdk/jdk/pull/4101) to add `FinalizerEvent` which include a field named `finalizedClass` to record the type information. Therefore, I have reason to believe that this API will be frequently used during the JFR activation process. >> >> As far as the current implementation is concerned, it is indeed a bit complicated, I think some simplifications can be made, for example, only the fast path for the normal class is retained, and other paths are directly implemented by calling the native function. What do you think? >> >> @egahlin @mgronlun >> And I hope JFR's folks could give some suggestions on this PR:) >> >> Best, >> Denghui > > Hi @D-D-H, > > Can you show pseudo code for your fast/slow path suggestion? > What is most frequent (performance critical) path is used? > If it simplify code I am for it. > > I am not sure about implementing it in C1 since in default mode methods will be compiled by C2. So for C1 to call native implementation could be fine and we can remove code there. Or you can file followup RFE to fix it in C1 later. Hi @vnkozlov , Thank you for your reply, I am not sure if the pseudo-code I describe below is what you expect, if not, please feel free to point it out. The native impletation of JVM.getClassId is `jfr_class_id`, which directly call `JfrTraceId::load` and the parameter `raw` is false. And there are two paths (one for normal class, one for primitive class) in `JfrTraceId::load`, the pseudo-code is as follows: (a) For the normal classes path: epoch = JfrTraceIdEpoch::_epoch_state ? 2 : 1 if (oop->klass->trace_id & ((epoch << META_SHIFT) | epoch)) != epoch) { // here is the slow path, and only occur when this klass is first recorded // or the epoch of JFR shifts(this won't happen frequently) SET_USED_THIS_EPOCH enqueue klass (a function call) if (!signaled) // load_acquire, see JfrSignal::signal signaled = true // release_store } id = oop->klass->trace_id >> TRACE_ID_SHIFT In short, the fast path will not enter the if branch, and only uses some simple instructions. (b) For the primitive classes path: if oop->array_klass != null id = (oop->array_klass->trace_id >> TRACE_ID_SHIFT) + 1 // primitive class path else id = LAST_TYPE_ID + 1 // void class path, LAST_TYPE_ID is constant if (!signaled) // same as before signaled = true This path is clear, there is no complex operation. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From github.com+10835776+stsypanov at openjdk.java.net Thu May 20 06:45:43 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Thu, 20 May 2021 06:45:43 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:38:03 GMT, ?????? ??????? wrote: >> Non-static classes hold a link to their parent classes, which in many cases can be avoided. > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8261880: Remove static from declarations of Holder nested classes Any more comments? ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From ningsheng.jian at arm.com Thu May 20 10:42:44 2021 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Thu, 20 May 2021 18:42:44 +0800 Subject: RFR: 8267356: AArch64: Vector API SVE codegen support In-Reply-To: <996105ee-d838-79ad-7536-590a814bf8d6@redhat.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> <996105ee-d838-79ad-7536-590a814bf8d6@redhat.com> Message-ID: <3661bb09-5de6-8b8b-1d38-da94a2731cf4@arm.com> Hi Andrew, On 5/20/21 4:15 PM, Andrew Haley wrote: > On 5/20/21 8:48 AM, Ningsheng Jian wrote: >> Note: our original plan was making this work part of JEP 414 Vector API (Second Incubator) [1], but we realized that it's now close to 17 release cycle and the JEP process may take time. Adding more features could delay the whole review process for the JEP. So we separate this work out as a standalone patch. > > Does this code do anything in the JDK itself? Can it be tested outside > the incubator project? > Without this patch, running current Vector API code, landed in jdk16, on SVE systems, will simply fallback to the API's java implementation without intrinsification. This patch has been tested with existing Vector API code in current jdk/jdk project, with expected SVE code generated. Thanks, Ningsheng From redestad at openjdk.java.net Thu May 20 10:45:37 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 20 May 2021 10:45:37 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:38:03 GMT, ?????? ??????? wrote: >> Non-static classes hold a link to their parent classes, which in many cases can be avoided. > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8261880: Remove static from declarations of Holder nested classes Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From adinn at redhat.com Thu May 20 11:02:40 2021 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 May 2021 12:02:40 +0100 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> Message-ID: <49d52c95-782c-1178-436a-bd3848ba2a25@redhat.com> Hi Sandhya, On 20/05/2021 00:51, Viswanathan, Sandhya wrote: > Sorry for delay in response. > The code contributed here is from Intel SVML which is shipped with Intel C compiler. > The origin of SVML goes back to early 2000s and is well tested. > The routines contributed to OpenJDK are high accuracy (within 1ulp) routines. > The library is written using Intel C Compiler extensions. > The generated code is the only way we could bring it in. > Our collective goal is to bring SIMD programming to Java application writers. > Due to lack of SIMD programming capability in Java, compute intensive applications originally written in Java are looking to move to native or other languages in future. > As part of Vector API incubation, we want to give a platform to application writers where they can experiment writing their data-parallel algorithms in Java itself. > We hope that bringing generated assembly like this is one off as you mention and not something to become a practice. I appreciate the above explanation and am largely swayed by the argument that this is a useful, albeit expedient, way forward. Thank you for pinpointing the goal here. Could you clarify one important detail. Is the original C code (with compiler-specific vector extensions) able to be made available either as open source or using some other form of licensing. I ask because I think it would significantly lower the risk I feel is present in importing this code if OpenJDK devs were able to see the source functions from which the various machine code routines have been derived and understand the algorithms that they embody. Without that sort of understanding I can see this becoming a bug trap where a report of a vector floating point computation that goes awry may well leave whoever is left to debug the problem with no clear way of ruling out a problem in the vector code, most especially wasting a lot of time trying to do so before looking elsewhere for a more likely/obvious error. Of course, the situation is not necessarily a lot better with the source available; generated machine code is clearly not going to be easily mapped back to C code. Still, I'd prefer it if OpenJDK devs had a fighting chance than little or none. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From github.com+10835776+stsypanov at openjdk.java.net Thu May 20 13:07:34 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Thu, 20 May 2021 13:07:34 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: <-6K32RgKOwmXJztestBAA8OWC406eWVuqTRIjvfzcoQ=.59f5b0be-a1d9-48a8-89bd-5b7c51222bd7@github.com> On Thu, 20 May 2021 10:42:49 GMT, Claes Redestad wrote: >> ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261880: Remove static from declarations of Holder nested classes > > Marked as reviewed by redestad (Reviewer). @cl4es now you can sponsor :) ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From github.com+1324414+desruisseaux at openjdk.java.net Thu May 20 13:18:36 2021 From: github.com+1324414+desruisseaux at openjdk.java.net (Martin Desruisseaux) Date: Thu, 20 May 2021 13:18:36 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:38:03 GMT, ?????? ??????? wrote: >> Non-static classes hold a link to their parent classes, which in many cases can be avoided. > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8261880: Remove static from declarations of Holder nested classes Just for information there is similar issues in `javax.imageio.metadata.IIOMetadataFormatImpl` class in the `java.desktop` module. ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From github.com+10835776+stsypanov at openjdk.java.net Thu May 20 13:59:41 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Thu, 20 May 2021 13:59:41 GMT Subject: Integrated: 8261880: Change nested classes in java.base to static nested classes where possible In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 14:30:58 GMT, ?????? ??????? wrote: > Non-static classes hold a link to their parent classes, which in many cases can be avoided. This pull request has now been integrated. Changeset: 9425d3de Author: Sergey Tsypanov Committer: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/9425d3de83fe8f4caddef03ffa3f4dd4de58f236 Stats: 15 lines in 11 files changed: 0 ins; 0 del; 15 mod 8261880: Change nested classes in java.base to static nested classes where possible Reviewed-by: redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From sandhya.viswanathan at intel.com Thu May 20 15:03:57 2021 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 20 May 2021 15:03:57 +0000 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: <49d52c95-782c-1178-436a-bd3848ba2a25@redhat.com> References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> <49d52c95-782c-1178-436a-bd3848ba2a25@redhat.com> Message-ID: Hi Andrew, Intel has a long history of contributing to Java and being part of OpenJDK. We have always responded to bug reports and supported the code that we contribute. This shouldn?t be any different. Best Regards, Sandhya -----Original Message----- From: Andrew Dinn Sent: Thursday, May 20, 2021 4:03 AM To: Viswanathan, Sandhya ; Andrew Haley ; Paul Sandoz Cc: hotspot compiler Subject: Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics Hi Sandhya, On 20/05/2021 00:51, Viswanathan, Sandhya wrote: > Sorry for delay in response. > The code contributed here is from Intel SVML which is shipped with Intel C compiler. > The origin of SVML goes back to early 2000s and is well tested. > The routines contributed to OpenJDK are high accuracy (within 1ulp) routines. > The library is written using Intel C Compiler extensions. > The generated code is the only way we could bring it in. > Our collective goal is to bring SIMD programming to Java application writers. > Due to lack of SIMD programming capability in Java, compute intensive applications originally written in Java are looking to move to native or other languages in future. > As part of Vector API incubation, we want to give a platform to application writers where they can experiment writing their data-parallel algorithms in Java itself. > We hope that bringing generated assembly like this is one off as you mention and not something to become a practice. I appreciate the above explanation and am largely swayed by the argument that this is a useful, albeit expedient, way forward. Thank you for pinpointing the goal here. Could you clarify one important detail. Is the original C code (with compiler-specific vector extensions) able to be made available either as open source or using some other form of licensing. I ask because I think it would significantly lower the risk I feel is present in importing this code if OpenJDK devs were able to see the source functions from which the various machine code routines have been derived and understand the algorithms that they embody. Without that sort of understanding I can see this becoming a bug trap where a report of a vector floating point computation that goes awry may well leave whoever is left to debug the problem with no clear way of ruling out a problem in the vector code, most especially wasting a lot of time trying to do so before looking elsewhere for a more likely/obvious error. Of course, the situation is not necessarily a lot better with the source available; generated machine code is clearly not going to be easily mapped back to C code. Still, I'd prefer it if OpenJDK devs had a fighting chance than little or none. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Thu May 20 15:27:48 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 May 2021 16:27:48 +0100 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> Message-ID: Hi, thanks. One or two points inline. On 5/20/21 12:51 AM, Viswanathan, Sandhya wrote: > The routines contributed to OpenJDK are high accuracy (within 1ulp) routines. Do the routines meet the (semi-) monotonicity requirements for Java.lang.Math? "Therefore, most methods with more than 0.5 ulp errors are required to be semi-monotonic: whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation. Not all approximations that have 1 ulp accuracy will automatically meet the monotonicity requirements." > The library is written using Intel C Compiler extensions. > The generated code is the only way we could bring it in. Sure, I understand. However, you could bring in the source code as well. Sure, you would need the Intel C compiler to compile it, but it would be a valuable reference for maintainers, just to help understand. > As part of Vector API incubation, we want to give a platform to > application writers where they can experiment writing their > data-parallel algorithms in Java itself. > We hope that bringing generated assembly like this is one off as you > mention and not something to become a practice. I hope not. Thank you for the reply. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu May 20 15:31:36 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 May 2021 16:31:36 +0100 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: <6AA63D23-E2C1-4510-8FCC-2B17FFF3465E@oracle.com> References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> <6AA63D23-E2C1-4510-8FCC-2B17FFF3465E@oracle.com> Message-ID: <2e4b6fb0-e53e-43d3-b680-b50a53cfe04a@redhat.com> On 5/20/21 12:34 AM, Paul Sandoz wrote: > I don?t think this should be considered a generally acceptable approach for Vector API operations (most code for operations does not and should not follow this approach), nor is it generally acceptable for other kinds of intrinsic in HotSpot (I believe there are a few special cases under os_cpu). Thus we should dissuade the use of .S source for other intrinsic cases. I've got nothing at all against .S files, as long as they are the real preferred form. That is to say, they should be the actual source code, as written by someone. > Does this help alleviate some of your concerns? Somewhat, but I wonder if this, as a matter of policy, is an area in which the Governing Board should get involved. I don't want to hold up progress, of course, but this is potentially a very important issue. I guess I wouldn't mind as long as we had a "This far, and no further" policy, with some hope that the library could be replaced by readable and maintainable code. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From john.r.rose at oracle.com Thu May 20 18:54:21 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 20 May 2021 18:54:21 +0000 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: <2e4b6fb0-e53e-43d3-b680-b50a53cfe04a@redhat.com> References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> <6AA63D23-E2C1-4510-8FCC-2B17FFF3465E@oracle.com> <2e4b6fb0-e53e-43d3-b680-b50a53cfe04a@redhat.com> Message-ID: On May 20, 2021, at 8:31 AM, Andrew Haley wrote: > > On 5/20/21 12:34 AM, Paul Sandoz wrote: >> I don?t think this should be considered a generally acceptable approach for Vector API operations (most code for operations does not and should not follow this approach), nor is it generally acceptable for other kinds of intrinsic in HotSpot (I believe there are a few special cases under os_cpu). Thus we should dissuade the use of .S source for other intrinsic cases. > > I've got nothing at all against .S files, as long as they are the real > preferred form. That is to say, they should be the actual source code, > as written by someone. Yes, these .S files are a somewhat painful compromise which we are committed to improve. Intel is contributing them as a one-time artifact which we are, in fact, responsible to maintain. By hand, as the preferred form of the source. (Preferred to what?? Well, preferred to nothing at all.) The only reason we are doing this we are inside the incubation process. We know we will change the code moving forward. But for now we want a timely release of an unfinished work for evaluation by the community. And that?s pretty much what incubation is for, right? So I?ll characterize the current sources as just-barely-workable by hand, just enough to do very light maintenance and vetting, although totally impractical for meaningful improvement, such as dealing with issues like 0.5ULP behavior and monotonicity. These are not the sources you are looking for that future work. Exiting incubation, any of the following options seem allowable to me, in the way that the current sources are not allowable: - using a very slow element-wise loop over JDK math methods - using well-written assembly code (which does not exist now) - using well-written C/asm code from some open source project - bundling a library from appropriate sources with appropriate permissions - using the Vector API itself to write portable numerics That last option seems very desirable, and getting the Vector API into incubation, with well-performing math primitives, is a giant step forward in that direction. And we want to go in that general direction anyway (whether we use special math libraries or not). > >> Does this help alleviate some of your concerns? > > Somewhat, but I wonder if this, as a matter of policy, is an area in > which the Governing Board should get involved. I don't want to hold up > progress, of course, but this is potentially a very important issue. I think this could rise to the GB level if we needed to make a strong policy change, but as I?ve said above, I think we are in policy here. (Just barely.) For any conceivable issue of maintainability, surely the open review process is enough, without asking the GB to weigh in on change set reviews. And I think this is about maintainability. > I guess I wouldn't mind as long as we had a "This far, and no further" > policy, with some hope that the library could be replaced by readable > and maintainable code. Well in this case, we have two things: 1. Temporary expedient only for incubation, to gain public feedback. 2. Clear call for a plausible alternative, to be answered before incubation exit. That?s probably enough ?case law? to help clarify the relevant policy. What do you think? ? John From mgronlun at openjdk.java.net Thu May 20 19:36:39 2021 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 20 May 2021 19:36:39 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 07:20:18 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix crash problem Hi Denghui, overall looks good, thanks a lot for doing this work. I have suggested some small changes, but overall I think the class id tag semantics are preserved. Thank you Markus src/hotspot/share/c1/c1_Compiler.cpp line 225: > 223: case vmIntrinsics::_counterTime: > 224: case vmIntrinsics::_getEventWriter: > 225: // TODO: temporarily not implement getClassId in c1 I think we need not worry about an intrinsic for C1, so please remove. src/hotspot/share/jfr/jni/jfrJniMethodRegistration.cpp line 48: > 46: (char*)"getAllEventClasses", (char*)"()Ljava/util/List;", (void*)jfr_get_all_event_classes, > 47: (char*)"getClassId", (char*)"(Ljava/lang/Class;)J", (void*)jfr_class_id, > 48: (char*)"getClassIdNonIntrinsic", (char*)"(Ljava/lang/Class;)J", (void*)jfr_class_id, Please remove the getClassIdNonIntrinsic entry, thanks. In addition, can you also remove the now abandoned entry point on the Java side, jdk.jfr.internal.JVM.getClassIdNonIntrinsic. And the entry point in jfr/jni/jniMethod.hpp | .cpp. Thanks. src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.hpp line 71: > 69: class JfrTraceIdLoadBarrier : AllStatic { > 70: friend class JfrCheckpointManager; > 71: friend class SharedRuntime; Don't think we need to involve SharedRuntime. src/hotspot/share/opto/library_call.cpp line 61: > 59: > 60: #ifdef JFR_HAVE_INTRINSICS > 61: #include "jfr/recorder/checkpoint/types/traceid/jfrTraceIdEpoch.hpp" Can you move all entry points you need to expose to "jfr/jfr.hpp" and include that instead? I prefer to avoid exposing a lot of the internal impl. details if possible. You can then move the SharedRuntime::trace_id_load_barrier() routines, into jfr/jfr.cpp and avoid touching the runtime/sharedRuntime.hpp | .cpp. Thanks. src/hotspot/share/opto/library_call.cpp line 2775: > 2773: */ > 2774: bool LibraryCallKit::inline_native_classID() { > 2775: Node* cls = null_check(argument(0), T_OBJECT); We can remove the null check, The mirror passed is null checked at the callsite (EventWriter). src/hotspot/share/opto/library_call.cpp line 2784: > 2782: TypeRawPtr::BOTTOM, TypeKlassPtr::OBJECT_OR_NULL)); > 2783: > 2784: Node* signaled_flag_address = makecon(TypeRawPtr::make(JfrTraceIdEpoch::signal_address())); Can you move this expression to become dependent if you actually need it? It will not be needed in the majority of cases (the InstanceKlass will already be tagged). Thanks. src/hotspot/share/opto/runtime.cpp line 1502: > 1500: } > 1501: > 1502: const TypeFunc *OptoRuntime::trace_id_load_barrier_Type() { Perhaps inside #ifdef INCLUDE_JFR src/hotspot/share/opto/runtime.hpp line 307: > 305: static const TypeFunc* register_finalizer_Type(); > 306: > 307: static const TypeFunc* trace_id_load_barrier_Type(); JFR_ONLY(static const TypeFunc* trace_id_load_barrier_Type();) src/hotspot/share/runtime/sharedRuntime.cpp line 1884: > 1882: > 1883: #ifdef JFR_HAVE_INTRINSICS > 1884: JRT_LEAF(void, SharedRuntime::trace_id_load_barrier(Klass * klass)) This moves to "jfr/jfr.cpp" instead, thanks. src/hotspot/share/runtime/sharedRuntime.hpp line 526: > 524: > 525: #ifdef JFR_HAVE_INTRINSICS > 526: static void trace_id_load_barrier(Klass* klass); Move to "jfr/jfr.hpp", thanks. ------------- Changes requested by mgronlun (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3470 From psandoz at openjdk.java.net Thu May 20 21:19:31 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 20 May 2021 21:19:31 GMT Subject: RFR: 8267190: Optimize Vector API test operations [v2] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 01:27:48 GMT, Sandhya Viswanathan wrote: >> Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: >> 1) reinterpreting the floating point vectors as integral vectors (int/long) >> 2) perform the test in integer domain to get a int/long mask >> 3) reinterpret the int/long mask as float/double mask >> Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. >> >> For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: >> >> Base: >> Benchmark (size) Mode Cnt Score Error Units >> VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms >> VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms >> VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms >> VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms >> VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms >> >> With patch: >> Benchmark (size) Mode Cnt Score Error Units >> VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms >> VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms >> VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms >> VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms >> VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement Paul's review comments Java changes are good, some minor comments if you choose to accept them, no need for me to review further. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-VectorBits.java.template line 852: > 850: private final > 851: VectorMask defaultMaskCast(AbstractSpecies dsp) { > 852: boolean[] maskArray = toArray(); Can you add an `assert length() != species.laneCount()`? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-VectorBits.java.template line 854: > 852: boolean[] maskArray = toArray(); > 853: // enum-switches don't optimize properly JDK-8161245 > 854: return ( Minor syntactic quibble: you don't need the '(` and `)` surrounding the switch expressions e.g.: return switch (dsp.laneType.switchKey) { case ... } ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4039 From neliasso at openjdk.java.net Thu May 20 21:27:59 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 20 May 2021 21:27:59 GMT Subject: RFR: 8267332: xor value should handle bounded values Message-ID: In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. Test supplied. Please review, Best regards, Nils Eliasson ------------- Commit messages: - fix xor value Changes: https://git.openjdk.java.net/jdk/pull/4136/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267332 Stats: 165 lines in 2 files changed: 165 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4136/head:pull/4136 PR: https://git.openjdk.java.net/jdk/pull/4136 From neliasso at openjdk.java.net Thu May 20 21:53:57 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 20 May 2021 21:53:57 GMT Subject: RFR: 8267332: xor value should handle bounded values [v2] In-Reply-To: References: Message-ID: > In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. > > The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. > The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. > > Test supplied. > > Please review, > Best regards, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Fix bounds check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4136/files - new: https://git.openjdk.java.net/jdk/pull/4136/files/ebeddef9..5f975338 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/4136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4136/head:pull/4136 PR: https://git.openjdk.java.net/jdk/pull/4136 From redestad at openjdk.java.net Thu May 20 21:53:58 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 20 May 2021 21:53:58 GMT Subject: RFR: 8267332: xor value should handle bounded values [v2] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 21:50:46 GMT, Nils Eliasson wrote: >> In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. >> >> The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. >> The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. >> >> Test supplied. >> >> Please review, >> Best regards, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Fix bounds check src/hotspot/share/opto/addnode.cpp line 929: > 927: if ((t1i->_lo >= 0) && > 928: (t1i->_hi > 0) && > 929: (t1i->_hi <= max_power_of_2()) && I think you could use `<= std::numeric_limits::max()` as the upper bound condition, and use `(jint)(next_power_of_2((uint)t1i->_hi) - 1)` to produce the mask in an overflow-conscious way. ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From neliasso at openjdk.java.net Thu May 20 21:53:59 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 20 May 2021 21:53:59 GMT Subject: RFR: 8267332: xor value should handle bounded values [v2] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 21:44:36 GMT, Claes Redestad wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bounds check > > src/hotspot/share/opto/addnode.cpp line 929: > >> 927: if ((t1i->_lo >= 0) && >> 928: (t1i->_hi > 0) && >> 929: (t1i->_hi <= max_power_of_2()) && > > I think you could use `<= std::numeric_limits::max()` as the upper bound condition, and use `(jint)(next_power_of_2((uint)t1i->_hi) - 1)` to produce the mask in an overflow-conscious way. max_power_of_2() is less than std::numeric_limits::max(). But I noticed I got the bounds wrong - is should be strictly less: t1i->_hi < max_power_of_2() ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From redestad at openjdk.java.net Thu May 20 22:01:11 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 20 May 2021 22:01:11 GMT Subject: RFR: 8267332: xor value should handle bounded values [v3] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 21:50:46 GMT, Nils Eliasson wrote: >> src/hotspot/share/opto/addnode.cpp line 929: >> >>> 927: if ((t1i->_lo >= 0) && >>> 928: (t1i->_hi > 0) && >>> 929: (t1i->_hi <= max_power_of_2()) && >> >> I think you could use `<= std::numeric_limits::max()` as the upper bound condition, and use `(jint)(next_power_of_2((uint)t1i->_hi) - 1)` to produce the mask in an overflow-conscious way. > > max_power_of_2() is less than std::numeric_limits::max(). > > But I noticed I got the bounds wrong - is should be strictly less: > t1i->_hi < max_power_of_2() Right, that's why I widened to an `uint` (since `next_power_of_2((uint)jint::max())` is well-defined) then cast back to a `jint` after subtracting 1. ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From neliasso at openjdk.java.net Thu May 20 22:01:09 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 20 May 2021 22:01:09 GMT Subject: RFR: 8267332: xor value should handle bounded values [v3] In-Reply-To: References: Message-ID: > In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. > > The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. > The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. > > Test supplied. > > Please review, > Best regards, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: fixed missing test case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4136/files - new: https://git.openjdk.java.net/jdk/pull/4136/files/5f975338..c7fecda8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4136/head:pull/4136 PR: https://git.openjdk.java.net/jdk/pull/4136 From redestad at openjdk.java.net Thu May 20 22:04:35 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 20 May 2021 22:04:35 GMT Subject: RFR: 8267332: xor value should handle bounded values [v3] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 21:57:32 GMT, Claes Redestad wrote: >> max_power_of_2() is less than std::numeric_limits::max(). >> >> But I noticed I got the bounds wrong - is should be strictly less: >> t1i->_hi < max_power_of_2() > > Right, that's why I widened to an `uint` (since `next_power_of_2((uint)jint::max())` is well-defined) then cast back to a `jint` after subtracting 1. Another way of expressing the same in an overflow-conscious way (without type conversion) is `round_down_power_of_2(t1i->_hi) + (round_down_power_of_2(t1i->_hi) - 1)` ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From sviswanathan at openjdk.java.net Thu May 20 23:19:01 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 20 May 2021 23:19:01 GMT Subject: RFR: 8267190: Optimize Vector API test operations [v3] In-Reply-To: References: Message-ID: > Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: > 1) reinterpreting the floating point vectors as integral vectors (int/long) > 2) perform the test in integer domain to get a int/long mask > 3) reinterpret the int/long mask as float/double mask > Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. > > For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: > > Base: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms > > With patch: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4039/files - new: https://git.openjdk.java.net/jdk/pull/4039/files/b506fc45..f318c0ee Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4039&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4039&range=01-02 Stats: 372 lines in 31 files changed: 31 ins; 62 del; 279 mod Patch: https://git.openjdk.java.net/jdk/pull/4039.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4039/head:pull/4039 PR: https://git.openjdk.java.net/jdk/pull/4039 From sviswanathan at openjdk.java.net Thu May 20 23:36:37 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 20 May 2021 23:36:37 GMT Subject: RFR: 8267190: Optimize Vector API test operations [v3] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 23:19:01 GMT, Sandhya Viswanathan wrote: >> Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: >> 1) reinterpreting the floating point vectors as integral vectors (int/long) >> 2) perform the test in integer domain to get a int/long mask >> 3) reinterpret the int/long mask as float/double mask >> Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. >> >> For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: >> >> Base: >> Benchmark (size) Mode Cnt Score Error Units >> VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms >> VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms >> VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms >> VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms >> VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms >> >> With patch: >> Benchmark (size) Mode Cnt Score Error Units >> VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms >> VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms >> VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms >> VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms >> VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments Thanks Paul. I have implemented these two suggestions as well. If no objections from any one else, I plan to integrate this tomorrow, Friday May 21. ------------- PR: https://git.openjdk.java.net/jdk/pull/4039 From kvn at openjdk.java.net Thu May 20 23:56:32 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 20 May 2021 23:56:32 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 [v2] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 11:23:05 GMT, Jie Fu wrote: >> Hi all, >> >> Several vector tests fail with UseAVX=1 after JDK-8256973. >> The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. >> The fix just disables the intrinsics when UseAVX < 2. >> >> Testing: >> - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Enable vpmovmskb for 128-bit vectors Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4109 From jiefu at openjdk.java.net Thu May 20 23:56:33 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 20 May 2021 23:56:33 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 [v2] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 11:23:05 GMT, Jie Fu wrote: >> Hi all, >> >> Several vector tests fail with UseAVX=1 after JDK-8256973. >> The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. >> The fix just disables the intrinsics when UseAVX < 2. >> >> Testing: >> - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Enable vpmovmskb for 128-bit vectors Could someone help to review this fix? I've noticed that JDK-8267519 was also filed about the same crash just now. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4109 From jiefu at openjdk.java.net Fri May 21 00:02:36 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 21 May 2021 00:02:36 GMT Subject: RFR: 8267370: [Vector API] Fix several crashes after JDK-8256973 In-Reply-To: References: <3BOYC9OLXPEbvzpFtYjXk0qST2xiawoxrpDss8Iyra4=.9c4948ef-1063-4888-bfb9-7fbc43e2c0f0@github.com> Message-ID: On Wed, 19 May 2021 11:20:13 GMT, Jie Fu wrote: > Good. Thanks @vnkozlov . ------------- PR: https://git.openjdk.java.net/jdk/pull/4109 From jiefu at openjdk.java.net Fri May 21 00:02:36 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 21 May 2021 00:02:36 GMT Subject: Integrated: 8267370: [Vector API] Fix several crashes after JDK-8256973 In-Reply-To: References: Message-ID: <02TWvcUpdmPUAE5d1LIVo33eVTXTbBNxWgLl_EJtgDc=.1cb7951d-d342-44c5-b521-93974e78d3b6@github.com> On Wed, 19 May 2021 08:26:24 GMT, Jie Fu wrote: > Hi all, > > Several vector tests fail with UseAVX=1 after JDK-8256973. > The reason is that `vpmovmskb` [1] can be only used with UseAVX > 1 [2]. > The fix just disables the intrinsics when UseAVX < 2. > > Testing: > - jdk/incubator/vector with UseAVX={0/1/2/3} on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L3785 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L4127 This pull request has now been integrated. Changeset: 7a63ff70 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/7a63ff70c8eed6c5bfad5655f0f4fa2281b4e104 Stats: 9 lines in 5 files changed: 1 ins; 0 del; 8 mod 8267370: [Vector API] Fix several crashes after JDK-8256973 Co-authored-by: Jatin Bhateja Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/4109 From kvn at openjdk.java.net Fri May 21 00:26:33 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 21 May 2021 00:26:33 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 07:20:18 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix crash problem So if most frequent path is *the* check then I can understand why inlining it (as intrinsic) will help. I suggest, first set `prob` parameter for ` __ if_then()` to move slow code from main path. See: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp#L241 Second, if possible move all code under ` __ if_then()` into runtime and make call to it. Code for arrays looks simple so we can keep it inlined. And remove code from C1. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From dholmes at openjdk.java.net Fri May 21 02:22:37 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 21 May 2021 02:22:37 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 04:26:00 GMT, David Holmes wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > lir_div_strictfp and lir_mul_strictfp Can I please get re-reviews for this. Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From aph at redhat.com Fri May 21 07:40:19 2021 From: aph at redhat.com (Andrew Haley) Date: Fri, 21 May 2021 08:40:19 +0100 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> <6AA63D23-E2C1-4510-8FCC-2B17FFF3465E@oracle.com> <2e4b6fb0-e53e-43d3-b680-b50a53cfe04a@redhat.com> Message-ID: <0d8f2f93-0f83-5771-ff94-3f6283d25c21@redhat.com> On 5/20/21 7:54 PM, John Rose wrote: > On May 20, 2021, at 8:31 AM, Andrew Haley wrote: >> >> On 5/20/21 12:34 AM, Paul Sandoz wrote: >> >>> Does this help alleviate some of your concerns? >> >> Somewhat, but I wonder if this, as a matter of policy, is an area in >> which the Governing Board should get involved. I don't want to hold up >> progress, of course, but this is potentially a very important issue. > > I think this could rise to the GB level if we needed to make a strong > policy change, but as I?ve said above, I think we are in policy here. > (Just barely.) For any conceivable issue of maintainability, surely the > open review process is enough, without asking the GB to weigh in > on change set reviews. And I think this is about maintainability. It is, but that's not not entirely what I'm worried about. The four (software) freedoms are: The freedom to run the program as you wish, for any purpose (freedom 0). The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this. The freedom to redistribute copies so you can help your neighbor (freedom 2). The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this. but not entirely. In this case we have 0, 2, and 3, but not 1. So, this issue is about more than mere utility, but something more fundamental. It's about the right of our users to understand how OpenJDK works. My question is, then, (please forgive the paraphrase), are we giving up essential freedom to purchase a little temporary utility? > Intel is contributing them as a one-time artifact which we are, > in fact, responsible to maintain. By hand, as the preferred > form of the source. (Preferred to what?? Well, preferred to > nothing at all.) NB: "preferred form" is a term used (but not fully defined) in GPLv2. It's not easy to define, but we know it when we see it: it's the form a programmer prefers to edit, the original source code. > Well in this case, we have two things: > > 1. Temporary expedient only for incubation, to gain public feedback. > 2. Clear call for a plausible alternative, to be answered before incubation exit. OK, but I don't hold out much hope of 2 actually succeeding before incubation exit. > That?s probably enough ?case law? to help clarify the relevant policy. > > What do you think? I think that's OK, as long as it's well-enough understood. By the way, slightly off topic: being rather conflict averse I did wonder whether I should object to this commit, but I reasoned that this kind of issue is exactly the reason that we have a governing board with community representatives. It's literally my duty. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From thartmann at openjdk.java.net Fri May 21 07:53:29 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 21 May 2021 07:53:29 GMT Subject: RFR: 8267332: xor value should handle bounded values [v3] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 22:01:09 GMT, Nils Eliasson wrote: >> In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. >> >> The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. >> The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. >> >> Test supplied. >> >> Please review, >> Best regards, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > fixed missing test case Very nice. I've added some minor comments to the test. test/hotspot/jtreg/compiler/types/TestMeetXor.java line 25: > 23: > 24: /* > 25: * @test This test looks like a good candidate for IR verification once the framework is integrated. Maybe file a follow-up RFE. test/hotspot/jtreg/compiler/types/TestMeetXor.java line 28: > 26: * @bug 8267332 > 27: * @summary Test meet on xor > 28: * @library /test/lib / Test should have `@key randomness` test/hotspot/jtreg/compiler/types/TestMeetXor.java line 39: > 37: public class TestMeetXor { > 38: public static void main(String[] args) throws Exception { > 39: for (int i = 0; i < 10000; i++) { Maybe increase number of iterations to make sure C2 compilation is triggered (and maybe also add `-Xbatch`). test/hotspot/jtreg/compiler/types/TestMeetXor.java line 49: > 47: > 48: static int[] count = new int[256]; > 49: static Random r = new Random(5); You should use `jdk.test.lib.Utils.getRandomInstance()` here which also prints the seed. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4136 From thartmann at openjdk.java.net Fri May 21 07:56:29 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 21 May 2021 07:56:29 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v2] In-Reply-To: <6GfJNOOeMRnkw7Zfk_btAcgtFuIes7LLc2IXiY16bV8=.f5f83668-4af6-450c-a8c8-fa105a08b12b@github.com> References: <6GfJNOOeMRnkw7Zfk_btAcgtFuIes7LLc2IXiY16bV8=.f5f83668-4af6-450c-a8c8-fa105a08b12b@github.com> Message-ID: On Thu, 20 May 2021 02:40:27 GMT, Yi Yang wrote: >> src/hotspot/share/c1/c1_Instruction.hpp line 2318: >> >>> 2316: >>> 2317: // accessors >>> 2318: bool is_raw_get() { return _is_raw_get; } >> >> I would rename this to `_is_raw` because we already know it's a get. > > Thanks Tobias for the review! All fixed. I will test it on Linux(already tested)/Mac/Windows(aarch64+x86_64) later. But I don't have ppc and s390 machines so I'm not sure how to test it on them... Looks good. Maybe someone from the OpenJDK community can run some sanity testing on ppc/s390. ------------- PR: https://git.openjdk.java.net/jdk/pull/3917 From thartmann at openjdk.java.net Fri May 21 08:05:48 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 21 May 2021 08:05:48 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v3] In-Reply-To: References: Message-ID: <182RYuy2ZX_C2cMYPB59-EC9ZzQ7Dx6nMOfqiTS5WLM=.83209a60-414c-4d4f-b1d4-d9c37fc001ab@github.com> On Wed, 19 May 2021 01:42:07 GMT, Hui Shi wrote: >> Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. >> >> In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. >> In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. >> >> Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. >> >> 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not meet nodes whose _idx is not unique (PhaseIterGVN::optimize is not triggered in the middle of PhaseRenumberLive ). Assertion every node is in current node_arena() in Node::verify, passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. >> >> 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. >> >> 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. >> >> 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. > > Hui Shi has updated the pull request incrementally with one additional commit since the last revision: > > Add comments for duplicated input processing in Node::Verify Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4045 From neliasso at openjdk.java.net Fri May 21 08:14:36 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 08:14:36 GMT Subject: RFR: 8267332: xor value should handle bounded values [v3] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 07:29:24 GMT, Tobias Hartmann wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed missing test case > > test/hotspot/jtreg/compiler/types/TestMeetXor.java line 25: > >> 23: >> 24: /* >> 25: * @test > > This test looks like a good candidate for IR verification once the framework is integrated. Maybe file a follow-up RFE. Yes - that's the good stuff I'm waiting for. I choose to not over-engineer this test with log-parsing and stuff. I filed https://bugs.openjdk.java.net/browse/JDK-8267527 ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From ddong at openjdk.java.net Fri May 21 08:19:00 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 08:19:00 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: <9qx6EjSOWt9KIQlUoR2YzBeLMrBXK63SMVHAIpmOMS0=.b8b8b9a2-caa1-4d33-b904-76d2bc9bc4a4@github.com> On Thu, 20 May 2021 19:14:07 GMT, Markus Gr?nlund wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> fix crash problem > > src/hotspot/share/c1/c1_Compiler.cpp line 225: > >> 223: case vmIntrinsics::_counterTime: >> 224: case vmIntrinsics::_getEventWriter: >> 225: // TODO: temporarily not implement getClassId in c1 > > I think we need not worry about an intrinsic for C1, so please remove. removed. > src/hotspot/share/jfr/jni/jfrJniMethodRegistration.cpp line 48: > >> 46: (char*)"getAllEventClasses", (char*)"()Ljava/util/List;", (void*)jfr_get_all_event_classes, >> 47: (char*)"getClassId", (char*)"(Ljava/lang/Class;)J", (void*)jfr_class_id, >> 48: (char*)"getClassIdNonIntrinsic", (char*)"(Ljava/lang/Class;)J", (void*)jfr_class_id, > > Please remove the getClassIdNonIntrinsic entry, thanks. In addition, can you also remove the now abandoned entry point on the Java side, jdk.jfr.internal.JVM.getClassIdNonIntrinsic. And the entry point in jfr/jni/jniMethod.hpp | .cpp. Thanks. removed. > src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.hpp line 71: > >> 69: class JfrTraceIdLoadBarrier : AllStatic { >> 70: friend class JfrCheckpointManager; >> 71: friend class SharedRuntime; > > Don't think we need to involve SharedRuntime. removed. > src/hotspot/share/opto/library_call.cpp line 61: > >> 59: >> 60: #ifdef JFR_HAVE_INTRINSICS >> 61: #include "jfr/recorder/checkpoint/types/traceid/jfrTraceIdEpoch.hpp" > > Can you move all entry points you need to expose to "jfr/jfr.hpp" and include that instead? I prefer to avoid exposing a lot of the internal impl. details if possible. You can then move the SharedRuntime::trace_id_load_barrier() routines, into jfr/jfr.cpp and avoid touching the runtime/sharedRuntime.hpp | .cpp. Thanks. removed. > src/hotspot/share/opto/library_call.cpp line 2775: > >> 2773: */ >> 2774: bool LibraryCallKit::inline_native_classID() { >> 2775: Node* cls = null_check(argument(0), T_OBJECT); > > We can remove the null check, The mirror passed is null checked at the callsite (EventWriter). good catch! Fixed. > src/hotspot/share/opto/library_call.cpp line 2784: > >> 2782: TypeRawPtr::BOTTOM, TypeKlassPtr::OBJECT_OR_NULL)); >> 2783: >> 2784: Node* signaled_flag_address = makecon(TypeRawPtr::make(JfrTraceIdEpoch::signal_address())); > > Can you move this expression to become dependent if you actually need it? It will not be needed in the majority of cases (the InstanceKlass will already be tagged). Thanks. fixed. > src/hotspot/share/opto/runtime.cpp line 1502: > >> 1500: } >> 1501: >> 1502: const TypeFunc *OptoRuntime::trace_id_load_barrier_Type() { > > Perhaps inside #if INCLUDE_JFR fixed > src/hotspot/share/opto/runtime.hpp line 307: > >> 305: static const TypeFunc* register_finalizer_Type(); >> 306: >> 307: static const TypeFunc* trace_id_load_barrier_Type(); > > JFR_ONLY(static const TypeFunc* trace_id_load_barrier_Type();) fixed > src/hotspot/share/runtime/sharedRuntime.cpp line 1884: > >> 1882: >> 1883: #ifdef JFR_HAVE_INTRINSICS >> 1884: JRT_LEAF(void, SharedRuntime::trace_id_load_barrier(Klass * klass)) > > This moves to "jfr/jfr.cpp" instead, thanks. fixed. > src/hotspot/share/runtime/sharedRuntime.hpp line 526: > >> 524: >> 525: #ifdef JFR_HAVE_INTRINSICS >> 526: static void trace_id_load_barrier(Klass* klass); > > Move to "jfr/jfr.hpp", thanks. fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 08:18:55 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 08:18:55 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v7] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/9fd0550b..24ce77b0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=05-06 Stats: 83 lines in 12 files changed: 32 ins; 42 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 08:22:33 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 08:22:33 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v6] In-Reply-To: References: Message-ID: On Wed, 19 May 2021 09:25:07 GMT, Markus Gr?nlund wrote: >> Hi Vladimir, >> >> Thanks for your comment. >> >> Yes, the native implementation for `getClassIdNonIntrinsic`/`getClassId` is located in `jfrTraceId.cpp#L178` just as you said, more specifically, there are two path, one(JfrTraceId::load) for normal class and one(load_primitive) for primitive class (includeing void.class). >> >> My pseudo-code(the comment of `LibraryCallKit::inline_native_classID`) is consistent with the implementation of these two paths. >> >> And in the normal class implementation path, there are fast path and slow path(see JfrTraceIdLoadBarrier::load), only some comparison and shift operations are needed to obtain the class ID in the fast path, and that's where I think intrinsic can bring performance improvements, I saw about 20x improvement from my microbenchmark. >> >> Judging from the current JFR implementation, there are already some events that need to rely on this API, such as `ExceptionThrownEvent` and `ErrorThrownEvent` use `thrownClass` to record the type of exception, and I also noticed that there is a new PR(https://github.com/openjdk/jdk/pull/4101) to add `FinalizerEvent` which include a field named `finalizedClass` to record the type information. Therefore, I have reason to believe that this API will be frequently used during the JFR activation process. >> >> As far as the current implementation is concerned, it is indeed a bit complicated, I think some simplifications can be made, for example, only the fast path for the normal class is retained, and other paths are directly implemented by calling the native function. What do you think? >> >> @egahlin @mgronlun >> And I hope JFR's folks could give some suggestions on this PR:) >> >> Best, >> Denghui > > Hi @D-D-H, sorry for the late reply. > > I am currently a bit busy but hope to get around taking a look at this soon. > > Thanks > Markus Hi @mgronlun @vnkozlov , Thank you very much for your comment, I have updated my patch, please help to review it again when you have time. Best, Denghui Dong ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 08:31:12 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 08:31:12 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v8] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: fix test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/24ce77b0..9cc49b93 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 09:07:13 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 09:07:13 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v9] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update copyright ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/9cc49b93..a32ef9e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=07-08 Stats: 12 lines in 12 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From neliasso at openjdk.java.net Fri May 21 09:27:33 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 09:27:33 GMT Subject: RFR: 8267332: xor value should handle bounded values [v3] In-Reply-To: References: Message-ID: On Thu, 20 May 2021 22:01:09 GMT, Claes Redestad wrote: >> Right, that's why I widened to an `uint` (since `next_power_of_2((uint)jint::max())` is well-defined) then cast back to a `jint` after subtracting 1. > > Another way of expressing the same in an overflow-conscious way (without type conversion) is `round_down_power_of_2(t1i->_hi) + (round_down_power_of_2(t1i->_hi) - 1)` I agree - that is a much better solution! Thanks Claes! ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From neliasso at openjdk.java.net Fri May 21 09:40:09 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 09:40:09 GMT Subject: RFR: 8267332: xor value should handle bounded values [v3] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 07:28:23 GMT, Tobias Hartmann wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed missing test case > > test/hotspot/jtreg/compiler/types/TestMeetXor.java line 39: > >> 37: public class TestMeetXor { >> 38: public static void main(String[] args) throws Exception { >> 39: for (int i = 0; i < 10000; i++) { > > Maybe increase number of iterations to make sure C2 compilation is triggered (and maybe also add `-Xbatch`). Thanks Tobias! Fixed all your comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From neliasso at openjdk.java.net Fri May 21 09:40:06 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 09:40:06 GMT Subject: RFR: 8267332: xor value should handle bounded values [v4] In-Reply-To: References: Message-ID: > In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. > > The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. > The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. > > Test supplied. > > Please review, > Best regards, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Updated test, fixed bound ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4136/files - new: https://git.openjdk.java.net/jdk/pull/4136/files/c7fecda8..4d856ec1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=02-03 Stats: 11 lines in 2 files changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4136/head:pull/4136 PR: https://git.openjdk.java.net/jdk/pull/4136 From neliasso at openjdk.java.net Fri May 21 09:52:27 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 09:52:27 GMT Subject: RFR: 8267332: xor value should handle bounded values [v5] In-Reply-To: References: Message-ID: > In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. > > The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. > The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. > > Test supplied. > > Please review, > Best regards, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Removed unnecessary check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4136/files - new: https://git.openjdk.java.net/jdk/pull/4136/files/4d856ec1..9be3b003 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=03-04 Stats: 12 lines in 1 file changed: 0 ins; 4 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/4136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4136/head:pull/4136 PR: https://git.openjdk.java.net/jdk/pull/4136 From thartmann at openjdk.java.net Fri May 21 09:52:28 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 21 May 2021 09:52:28 GMT Subject: RFR: 8267332: xor value should handle bounded values [v5] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 09:48:35 GMT, Nils Eliasson wrote: >> In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. >> >> The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. >> The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. >> >> Test supplied. >> >> Please review, >> Best regards, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Removed unnecessary check Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4136 From redestad at openjdk.java.net Fri May 21 09:52:28 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 21 May 2021 09:52:28 GMT Subject: RFR: 8267332: xor value should handle bounded values [v5] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 09:48:35 GMT, Nils Eliasson wrote: >> In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. >> >> The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. >> The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. >> >> Test supplied. >> >> Please review, >> Best regards, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Removed unnecessary check Looks good! ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4136 From neliasso at openjdk.java.net Fri May 21 09:59:07 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 09:59:07 GMT Subject: RFR: 8267332: xor value should handle bounded values [v6] In-Reply-To: References: Message-ID: > In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. > > The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. > The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. > > Test supplied. > > Please review, > Best regards, > Nils Eliasson Nils Eliasson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into xor_bound - Removed unnecessary check - Updated test, fixed bound - fixed missing test case - Fix bounds check - fix xor value ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4136/files - new: https://git.openjdk.java.net/jdk/pull/4136/files/9be3b003..c3a3a612 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4136&range=04-05 Stats: 4915 lines in 203 files changed: 2431 ins; 2058 del; 426 mod Patch: https://git.openjdk.java.net/jdk/pull/4136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4136/head:pull/4136 PR: https://git.openjdk.java.net/jdk/pull/4136 From mgronlun at openjdk.java.net Fri May 21 10:28:35 2021 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 21 May 2021 10:28:35 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v7] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 08:18:55 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Thanks, Denghui - I saw a few more simplifications. Thank you for accommodating. src/hotspot/share/jfr/jfr.cpp line 115: > 113: } > 114: > 115: #ifdef JFR_HAVE_INTRINSICS Can make this unconditional from #ifdef JFR_HAVE_INTRINSICS, to simplify a bit. src/hotspot/share/jfr/jfr.hpp line 31: > 29: #include "memory/allocation.hpp" > 30: > 31: #include "jfr/support/jfrIntrinsics.hpp" Don't think we need to include jfrIntrinsics.hpp (not even in .cpp) as we can declare this entry point unconditionally. src/hotspot/share/jfr/jfr.hpp line 33: > 31: #include "jfr/support/jfrIntrinsics.hpp" > 32: #ifdef JFR_HAVE_INTRINSICS > 33: #include "jfr/recorder/checkpoint/types/traceid/jfrTraceIdEpoch.hpp" Please create a Jfr::epoch_address() in the .hpp and move includes to the .cpp - thanks. src/hotspot/share/jfr/jfr.hpp line 63: > 61: static void include_thread(Thread* thread); > 62: > 63: #ifdef JFR_HAVE_INTRINSICS can be declared unconditionally to #ifdef JFR_HAVE_INTRINSICS (call site will have the conditional). src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.inline.hpp line 71: > 69: SET_USED_THIS_EPOCH(klass); > 70: enqueue(klass); > 71: JfrTraceIdEpoch::set_changed_tag_state(); JfrTraceIdEpoch::set_changed_tag_state(); here now obviates the need to program signal into the intrinsic so that can be removed. And signal_address() need not be exposed. src/hotspot/share/opto/library_call.cpp line 32: > 30: #include "compiler/compileLog.hpp" > 31: #include "gc/shared/barrierSet.hpp" > 32: #include "jfr/jfr.hpp" this needs #if INCLUDE_JFR guard src/hotspot/share/opto/library_call.cpp line 2820: > 2818: } __ end_if(); > 2819: > 2820: Node* signaled_flag_address = makecon(TypeRawPtr::make(JfrTraceIdEpoch::signal_address())); The signal stuff can now be removed. Please also remove JfrTraceIdEpoch::signal_address() exposure. ------------- Changes requested by mgronlun (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 11:10:55 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 11:10:55 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v10] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/a32ef9e2..d8516033 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=08-09 Stats: 21 lines in 3 files changed: 10 ins; 6 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 11:16:35 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 11:16:35 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v7] In-Reply-To: References: Message-ID: <4XQkuIw0jvKke9TbeyFYYu2VzdRvwKH12WA3zBjf7nE=.3f1d910b-d208-4ecb-b605-3af6fb4c9ef5@github.com> On Fri, 21 May 2021 10:17:52 GMT, Markus Gr?nlund wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/jfr/jfr.cpp line 115: > >> 113: } >> 114: >> 115: #ifdef JFR_HAVE_INTRINSICS > > Can make this unconditional from #ifdef JFR_HAVE_INTRINSICS, to simplify a bit. fixed > src/hotspot/share/jfr/jfr.hpp line 31: > >> 29: #include "memory/allocation.hpp" >> 30: >> 31: #include "jfr/support/jfrIntrinsics.hpp" > > Don't think we need to include jfrIntrinsics.hpp (not even in .cpp) as we can declare this entry point unconditionally. removed, thanks. > src/hotspot/share/jfr/jfr.hpp line 33: > >> 31: #include "jfr/support/jfrIntrinsics.hpp" >> 32: #ifdef JFR_HAVE_INTRINSICS >> 33: #include "jfr/recorder/checkpoint/types/traceid/jfrTraceIdEpoch.hpp" > > Please create a Jfr::epoch_address() in the .hpp and move includes to the .cpp - thanks. fixed. > src/hotspot/share/jfr/jfr.hpp line 63: > >> 61: static void include_thread(Thread* thread); >> 62: >> 63: #ifdef JFR_HAVE_INTRINSICS > > can be declared unconditionally to #ifdef JFR_HAVE_INTRINSICS (call site will have the conditional). fixed. > src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.inline.hpp line 71: > >> 69: SET_USED_THIS_EPOCH(klass); >> 70: enqueue(klass); >> 71: JfrTraceIdEpoch::set_changed_tag_state(); > > JfrTraceIdEpoch::set_changed_tag_state(); here now obviates the need to program signal into the intrinsic so that can be removed. And signal_address() need not be exposed. `load_barrier` is used by the path for normal class and `JfrTraceIdLoadBarrier::load(const Klass* klass)` the path for the primitive class still needs to update the signal. > src/hotspot/share/opto/library_call.cpp line 32: > >> 30: #include "compiler/compileLog.hpp" >> 31: #include "gc/shared/barrierSet.hpp" >> 32: #include "jfr/jfr.hpp" > > this needs #if INCLUDE_JFR guard fixed > src/hotspot/share/opto/library_call.cpp line 2820: > >> 2818: } __ end_if(); >> 2819: >> 2820: Node* signaled_flag_address = makecon(TypeRawPtr::make(JfrTraceIdEpoch::signal_address())); > > The signal stuff can now be removed. Please also remove JfrTraceIdEpoch::signal_address() exposure. the path for the primitive class needs it, and I add Jfr::signal_address() to wrapper it ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From mgronlun at openjdk.java.net Fri May 21 11:27:35 2021 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 21 May 2021 11:27:35 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v7] In-Reply-To: <4XQkuIw0jvKke9TbeyFYYu2VzdRvwKH12WA3zBjf7nE=.3f1d910b-d208-4ecb-b605-3af6fb4c9ef5@github.com> References: <4XQkuIw0jvKke9TbeyFYYu2VzdRvwKH12WA3zBjf7nE=.3f1d910b-d208-4ecb-b605-3af6fb4c9ef5@github.com> Message-ID: On Fri, 21 May 2021 11:12:59 GMT, Denghui Dong wrote: >> src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.inline.hpp line 71: >> >>> 69: SET_USED_THIS_EPOCH(klass); >>> 70: enqueue(klass); >>> 71: JfrTraceIdEpoch::set_changed_tag_state(); >> >> JfrTraceIdEpoch::set_changed_tag_state(); here now obviates the need to program signal into the intrinsic so that can be removed. And signal_address() need not be exposed. > > `load_barrier` is used by the path for normal class and `JfrTraceIdLoadBarrier::load(const Klass* klass)` > the path for the primitive class still needs to update the signal. ah..ok... ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From mgronlun at openjdk.java.net Fri May 21 11:45:38 2021 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 21 May 2021 11:45:38 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v10] In-Reply-To: References: Message-ID: <-P5JzXRzVa3DcregmK43XXEzfRe9QN4Pjn2PlfKaa80=.0efe7869-808b-4d79-98de-0d82220536c9@github.com> On Fri, 21 May 2021 11:10:55 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update src/hotspot/share/jfr/jfr.cpp line 115: > 113: } > 114: > 115: JRT_LEAF(void, Jfr::trace_id_load_barrier(Klass * klass)) please declare as const, as in Jfr::trace_id_load_barrier(const Klass* klass);" src/hotspot/share/jfr/jfr.hpp line 58: > 56: static void include_thread(Thread* thread); > 57: > 58: // get_class_id intrinsic support "Klass" needs a fwd declaration. src/hotspot/share/jfr/jfr.hpp line 59: > 57: > 58: // get_class_id intrinsic support > 59: static void trace_id_load_barrier(Klass* klass); I think we can make this general, because it is not necessarily specialized for the intrinsic use. Can we rename this to "get_class_id(const Klass* klass): instead? Then we can remove the comment // get_class_id intrinsic support., thanks src/hotspot/share/opto/library_call.cpp line 32: > 30: #include "compiler/compileLog.hpp" > 31: #include "gc/shared/barrierSet.hpp" > 32: #if INCLUDE_JFR It is custom to put the conditional includes at the end, not inlined (please see other files for conditional includes). Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From mgronlun at openjdk.java.net Fri May 21 11:55:35 2021 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 21 May 2021 11:55:35 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v10] In-Reply-To: <-P5JzXRzVa3DcregmK43XXEzfRe9QN4Pjn2PlfKaa80=.0efe7869-808b-4d79-98de-0d82220536c9@github.com> References: <-P5JzXRzVa3DcregmK43XXEzfRe9QN4Pjn2PlfKaa80=.0efe7869-808b-4d79-98de-0d82220536c9@github.com> Message-ID: On Fri, 21 May 2021 11:39:50 GMT, Markus Gr?nlund wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/jfr/jfr.hpp line 59: > >> 57: >> 58: // get_class_id intrinsic support >> 59: static void trace_id_load_barrier(Klass* klass); > > I think we can make this general, because it is not necessarily specialized for the intrinsic use. Can we rename this to "get_class_id(const Klass* klass): instead? Then we can remove the comment // get_class_id intrinsic support., thanks Hmm, perhaps that was not such a good idea after all - it would require that you have tested for tagging at the call site. But "get_class_id_intrinsic(const Klass* klass):" might perhaps be ok. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From neliasso at openjdk.java.net Fri May 21 14:02:38 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 14:02:38 GMT Subject: RFR: 8267332: xor value should handle bounded values [v6] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 09:59:07 GMT, Nils Eliasson wrote: >> In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. >> >> The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. >> The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. >> >> Test supplied. >> >> Please review, >> Best regards, >> Nils Eliasson > > Nils Eliasson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into xor_bound > - Removed unnecessary check > - Updated test, fixed bound > - fixed missing test case > - Fix bounds check > - fix xor value Thanks for the reviews Tobias and Claes! ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From thartmann at openjdk.java.net Fri May 21 14:06:32 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 21 May 2021 14:06:32 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v9] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Tue, 4 May 2021 15:53:25 GMT, Christian Hagedorn wrote: >> This RFE provides an IR test framework to perform regex-based checks on the C2 IR shape of test methods emitted by the VM flags `-XX:+PrintIdeal` and `-XX:+PrintOptoAssembly`. The framework can also be used for other non-IR matching (and non-compiler) tests by providing easy to use annotations for commonly used testing patterns and compiler control flags. >> >> The framework is based on the ideas of the currently present IR test framework in [Valhalla](https://github.com/openjdk/valhalla/blob/e9c78ce4fcfd01361c35883e0d68f9ae5a80d079/test/hotspot/jtreg/compiler/valhalla/inlinetypes/InlineTypeTest.java) (mainly implemented by @TobiHartmann) which is being used with great success. This new framework aims to replace the old one in Valhalla at some point. >> >> A detailed description about how this new IR test framework works and how it is used is provided in the [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) file and in the [Javadocs](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc/jdk/test/lib/hotspot/ir_framework/package-summary.html) written for the framework classes. >> >> To finish a first version of this framework for JDK 17, I decided to leave some improvement possibilities and ideas to be followed up on in additional RFEs. Some ideas are mentioned in "Future Work" in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) and were also created as subtasks of this RFE. >> >> Testing (also described in "Internal Framework Tests in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md)): >> There are various tests to verify the correctness of the test framework which can be found as JTreg tests in the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) folder. Additional testing was performed by converting all compiler Inline Types test of project Valhalla (done by @katyapav in [JDK-8263024](https://bugs.openjdk.java.net/browse/JDK-8263024)) that used the old framework to the new framework. This provided additional testing for the framework itself. We ran the converted tests with all the flag settings used in hs-tier1-9 and hs-precheckin-comp. For sanity checking, this was also done with a sample IR test in mainline. >> >> Some stats about the framework code added to [ir_framework](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework): >> >> - without the [Javadocs files](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc) : 60 changed files, 13212 insertions, 0 deletions. >> - without the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) and [examples](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/examples) folder: 40 files changed, 6781 insertions >> - comments: 2399 insertions (calculated with `git diff --cached !(tests|examples) | grep -c -E "(^[+-]\s*(/)?*)|(^[+-]\s*//)"`) >> - which leaves 4382 lines of code inserted >> >> Big thanks to: >> - @TobiHartmann for all his help by discussing the new framework and for providing insights from his IR test framework in Valhalla. >> - @katyapav for converting the Valhalla tests to use the new framework which found some harder to catch bugs in the framework and also some actual C2 bugs. >> - @iignatev for helping to simplify the framework usage with JTreg and with the framework internal VM calling structure. >> - and others who provided valuable feedback. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with three additional commits since the last revision: > > - Splitting classes into subpackages and updating README accordingly, fix bug with new line matching in lookbehind on Windows > - Fix package names and fixing internal tests, examples and README file accordingly > - Move framework to test/hotspot/jtreg/compiler/lib and tests to test/hotspot/jtreg/testlibrary_tests/compiler/lib/ir_framework Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java line 70: > 68: * > 69: */ > 70: ANY(-2), After the removal of AOT, this should be `-1`. test/hotspot/jtreg/compiler/lib/ir_framework/Compiler.java line 37: > 35: * Selecting both the C1 and C2 compiler. This must be in sync with hotspot/share/compiler/compilerDefinitions.hpp. > 36: */ > 37: ANY(-2), After the removal of AOT, this should be `-1`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From neliasso at openjdk.java.net Fri May 21 14:11:34 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 21 May 2021 14:11:34 GMT Subject: Integrated: 8267332: xor value should handle bounded values In-Reply-To: References: Message-ID: On Thu, 20 May 2021 21:14:14 GMT, Nils Eliasson wrote: > In the discussion of https://github.com/openjdk/jdk/pull/3938 a limitation in C2 was found. C2 fails to eliminate obvious bound checks for indexes that are masked with xor. > > The Xor for two values that have a lower bound of zero or more, the resulting lower bound is zero. > The Xor for two values that have a upperbound above zero, the resulting upper bound is the max of the next_power_of_2-1. > > Test supplied. > > Please review, > Best regards, > Nils Eliasson This pull request has now been integrated. Changeset: 4ba76138 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/4ba761381c60197be08d34580b92b5203fa9b189 Stats: 166 lines in 2 files changed: 166 ins; 0 del; 0 mod 8267332: xor value should handle bounded values Reviewed-by: thartmann, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/4136 From shade at openjdk.java.net Fri May 21 14:15:27 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 21 May 2021 14:15:27 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect Message-ID: See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. Attention @JohnTortugo. Additional testing: - [x] Failing fuzzer test (now passes) - [ ] Linux x86_64 fastdebug `tier1` ------------- Commit messages: - 8267531: [x86] Assembler::andb(Address,Register) encoding is incorrect Changes: https://git.openjdk.java.net/jdk/pull/4145/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4145&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267531 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4145.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4145/head:pull/4145 PR: https://git.openjdk.java.net/jdk/pull/4145 From ddong at openjdk.java.net Fri May 21 14:23:09 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 14:23:09 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v10] In-Reply-To: <-P5JzXRzVa3DcregmK43XXEzfRe9QN4Pjn2PlfKaa80=.0efe7869-808b-4d79-98de-0d82220536c9@github.com> References: <-P5JzXRzVa3DcregmK43XXEzfRe9QN4Pjn2PlfKaa80=.0efe7869-808b-4d79-98de-0d82220536c9@github.com> Message-ID: On Fri, 21 May 2021 11:33:06 GMT, Markus Gr?nlund wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/jfr/jfr.cpp line 115: > >> 113: } >> 114: >> 115: JRT_LEAF(void, Jfr::trace_id_load_barrier(Klass * klass)) > > please declare as const, as in Jfr::trace_id_load_barrier(const Klass* klass);" fixed > src/hotspot/share/jfr/jfr.hpp line 58: > >> 56: static void include_thread(Thread* thread); >> 57: >> 58: // get_class_id intrinsic support > > "Klass" needs a fwd declaration. added > src/hotspot/share/opto/library_call.cpp line 32: > >> 30: #include "compiler/compileLog.hpp" >> 31: #include "gc/shared/barrierSet.hpp" >> 32: #if INCLUDE_JFR > > It is custom to put the conditional includes at the end, not inlined (please see other files for conditional includes). Thanks. fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 14:23:00 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 14:23:00 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v11] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/d8516033..f99186b9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=09-10 Stats: 17 lines in 6 files changed: 5 ins; 3 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Fri May 21 14:23:11 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 21 May 2021 14:23:11 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v10] In-Reply-To: References: <-P5JzXRzVa3DcregmK43XXEzfRe9QN4Pjn2PlfKaa80=.0efe7869-808b-4d79-98de-0d82220536c9@github.com> Message-ID: On Fri, 21 May 2021 11:52:50 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/jfr/jfr.hpp line 59: >> >>> 57: >>> 58: // get_class_id intrinsic support >>> 59: static void trace_id_load_barrier(Klass* klass); >> >> I think we can make this general, because it is not necessarily specialized for the intrinsic use. Can we rename this to "get_class_id(const Klass* klass): instead? Then we can remove the comment // get_class_id intrinsic support., thanks > > Hmm, perhaps that was not such a good idea after all - it would require that you have tested for tagging at the call site. But "get_class_id_intrinsic(const Klass* klass):" might perhaps be ok. Good idea, updated. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From azeemj at openjdk.java.net Fri May 21 14:24:45 2021 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Fri, 21 May 2021 14:24:45 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: <1drhE4BIb8ljgSBxDUIt0ipvwQmVBfTOf2NKcIiDzlU=.cb6b3103-01bc-491f-820c-0b1b50f15ef7@github.com> On Fri, 21 May 2021 12:49:10 GMT, Aleksey Shipilev wrote: > See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. > > Attention @JohnTortugo. > > Additional testing: > - [x] Failing fuzzer test (now passes) > - [x] Linux x86_64 fastdebug `tier1` Not a reviewer, but this makes sense. ------------- Marked as reviewed by azeemj (Author). PR: https://git.openjdk.java.net/jdk/pull/4145 From jiefu at openjdk.java.net Fri May 21 15:36:00 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 21 May 2021 15:36:00 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: On Fri, 21 May 2021 12:49:10 GMT, Aleksey Shipilev wrote: > See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. > > Attention @JohnTortugo. > > Additional testing: > - [x] Failing fuzzer test (now passes) > - [x] Linux x86_64 fastdebug `tier1` Is it possible to add a jtreg test for this fix? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4145 From shade at openjdk.java.net Fri May 21 15:40:03 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 21 May 2021 15:40:03 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: On Fri, 21 May 2021 15:33:15 GMT, Jie Fu wrote: > Is it possible to add a jtreg test for this fix? I guess it is possible to minimize the fuzzer test and/or create a fully synthetic regression test. But given how it looks like a copy-paste omission in the recent patch, I would prefer to invest time somewhere else. ------------- PR: https://git.openjdk.java.net/jdk/pull/4145 From mgronlun at openjdk.java.net Fri May 21 15:55:06 2021 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 21 May 2021 15:55:06 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v11] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 14:23:00 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Thanks Denghui, good work. Markus ------------- Marked as reviewed by mgronlun (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3470 From shade at openjdk.java.net Fri May 21 16:37:08 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 21 May 2021 16:37:08 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: On Fri, 21 May 2021 15:36:51 GMT, Aleksey Shipilev wrote: > > Is it possible to add a jtreg test for this fix? > > I guess it is possible to minimize the fuzzer test and/or create a fully synthetic regression test. But given how it looks like a copy-paste omission in the recent patch, I would prefer to invest time somewhere else. I tried a few things, without success. Something like: /** * @test * @bug 8267531 * @summary [x86] Assembler::andb(Address,Register) encoding is incorrect * * @run main/othervm -XX:-TieredCompilation -XX:+PrintAssembly -XX:-UseNewCode * -XX:CompileCommand=compileonly,compiler.c2.Test8267531::test * compiler.c2.Test8267531 * @run main/othervm -XX:-TieredCompilation -XX:+PrintAssembly -XX:+UseNewCode * -XX:CompileCommand=compileonly,compiler.c2.Test8267531::test * compiler.c2.Test8267531 */ package compiler.c2; import java.util.Arrays; public class Test8267531 { static byte[] array = new byte[] { -1, -1, -1, -1, -1, -1, -1 }; static int idx = 3; static int mask; public static void test() { // match(Set dst (StoreB dst (AndI (LoadB dst) src))); array[idx] &= mask; } public static void main(String[] args) { for (int i = 0; i < 1000000; i++) { test(); if (array[4] != -1 || array[3] != 0 || array[2] != -1) { throw new IllegalStateException("Error: " + Arrays.toString(array)); } } } } It would seem we need to get lucky that both dst-addr-idx and src regs are fitting the branch path in `Assembler::prefix`. Let's mark the bug `noreg-hard` and move on? ------------- PR: https://git.openjdk.java.net/jdk/pull/4145 From github.com+2249648+johntortugo at openjdk.java.net Fri May 21 17:45:58 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Fri, 21 May 2021 17:45:58 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: On Fri, 21 May 2021 12:49:10 GMT, Aleksey Shipilev wrote: > See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. > > Attention @JohnTortugo. > > Additional testing: > - [x] Failing fuzzer test (now passes) > - [x] Linux x86_64 fastdebug `tier1` Not a reviewer but LGTM. Thanks for fixing my mistake! ------------- Marked as reviewed by JohnTortugo at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/4145 From vlivanov at openjdk.java.net Fri May 21 18:03:57 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 21 May 2021 18:03:57 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 04:26:00 GMT, David Holmes wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > lir_div_strictfp and lir_mul_strictfp Marked as reviewed by vlivanov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From sviswanathan at openjdk.java.net Fri May 21 18:18:30 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 21 May 2021 18:18:30 GMT Subject: Integrated: 8267190: Optimize Vector API test operations In-Reply-To: References: Message-ID: On Fri, 14 May 2021 23:58:38 GMT, Sandhya Viswanathan wrote: > Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps: > 1) reinterpreting the floating point vectors as integral vectors (int/long) > 2) perform the test in integer domain to get a int/long mask > 3) reinterpret the int/long mask as float/double mask > Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic. > > For the VectorTestPerf attached to the JBS for JDK-8267190, the performance improves as follows: > > Base: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ? 90.452 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ? 91.685 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ? 83.890 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ? 70.629 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ? 79.806 ops/ms > > With patch: > Benchmark (size) Mode Cnt Score Error Units > VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ? 40.477 ops/ms > VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ? 296.622 ops/ms > VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ? 269.988 ops/ms > VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ? 498.649 ops/ms > VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ? 11.849 ops/ms > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 8f10c5a8 Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/8f10c5a8900517cfa04256eab909e18535086b98 Stats: 1274 lines in 32 files changed: 652 ins; 279 del; 343 mod 8267190: Optimize Vector API test operations Reviewed-by: psandoz, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/4039 From vlivanov at openjdk.java.net Fri May 21 18:19:00 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 21 May 2021 18:19:00 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 04:26:00 GMT, David Holmes wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > lir_div_strictfp and lir_mul_strictfp There are some suspicious failures on linux-x86 in pre-submit testing results: - compiler/c1/Test6855215.java - compiler/intrinsics/string/TestStringLatin1IndexOfChar.java The tests explicitly specify `-XX:UseSSE=0`, so it may be related to the patch. Anybody interested in linux-x86 want to take a look? @shade @DamonFool ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From vlivanov at openjdk.java.net Fri May 21 18:27:00 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 21 May 2021 18:27:00 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: On Fri, 21 May 2021 12:49:10 GMT, Aleksey Shipilev wrote: > See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. > > Attention @JohnTortugo. > > Additional testing: > - [x] Failing fuzzer test (now passes) > - [x] Linux x86_64 fastdebug `tier1` Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4145 From sandhya.viswanathan at intel.com Fri May 21 18:34:06 2021 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 21 May 2021 18:34:06 +0000 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: <0d8f2f93-0f83-5771-ff94-3f6283d25c21@redhat.com> References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> <6AA63D23-E2C1-4510-8FCC-2B17FFF3465E@oracle.com> <2e4b6fb0-e53e-43d3-b680-b50a53cfe04a@redhat.com> <0d8f2f93-0f83-5771-ff94-3f6283d25c21@redhat.com> Message-ID: Hi Andrew/John, We made this contribution with the goal to help Vector API and its evaluation during incubation. This is the best we could do currently towards JDK 17. Please advice if you think that PR be withdrawn instead of integration at this point. We will go with your expert advice. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Andrew Haley Sent: Friday, May 21, 2021 12:40 AM To: John Rose Cc: Paul Sandoz ; hotspot compiler Subject: Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics On 5/20/21 7:54 PM, John Rose wrote: > On May 20, 2021, at 8:31 AM, Andrew Haley wrote: >> >> On 5/20/21 12:34 AM, Paul Sandoz wrote: >> >>> Does this help alleviate some of your concerns? >> >> Somewhat, but I wonder if this, as a matter of policy, is an area in >> which the Governing Board should get involved. I don't want to hold >> up progress, of course, but this is potentially a very important issue. > > I think this could rise to the GB level if we needed to make a strong > policy change, but as I?ve said above, I think we are in policy here. > (Just barely.) For any conceivable issue of maintainability, surely > the open review process is enough, without asking the GB to weigh in > on change set reviews. And I think this is about maintainability. It is, but that's not not entirely what I'm worried about. The four (software) freedoms are: The freedom to run the program as you wish, for any purpose (freedom 0). The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this. The freedom to redistribute copies so you can help your neighbor (freedom 2). The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this. but not entirely. In this case we have 0, 2, and 3, but not 1. So, this issue is about more than mere utility, but something more fundamental. It's about the right of our users to understand how OpenJDK works. My question is, then, (please forgive the paraphrase), are we giving up essential freedom to purchase a little temporary utility? > Intel is contributing them as a one-time artifact which we are, in > fact, responsible to maintain. By hand, as the preferred form of the > source. (Preferred to what?? Well, preferred to nothing at all.) NB: "preferred form" is a term used (but not fully defined) in GPLv2. It's not easy to define, but we know it when we see it: it's the form a programmer prefers to edit, the original source code. > Well in this case, we have two things: > > 1. Temporary expedient only for incubation, to gain public feedback. > 2. Clear call for a plausible alternative, to be answered before incubation exit. OK, but I don't hold out much hope of 2 actually succeeding before incubation exit. > That?s probably enough ?case law? to help clarify the relevant policy. > > What do you think? I think that's OK, as long as it's well-enough understood. By the way, slightly off topic: being rather conflict averse I did wonder whether I should object to this commit, but I reasoned that this kind of issue is exactly the reason that we have a governing board with community representatives. It's literally my duty. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kvn at openjdk.java.net Fri May 21 19:40:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 21 May 2021 19:40:09 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v11] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 14:23:00 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Good. What testing you did? ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3470 From jiefu at openjdk.java.net Fri May 21 22:58:59 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 21 May 2021 22:58:59 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: <6AliYiCVgff5S70ytHu1N106yycSAxG9d0cEBiXYzAI=.af335126-25df-44dc-97c1-0029306ef7a0@github.com> On Fri, 21 May 2021 12:49:10 GMT, Aleksey Shipilev wrote: > See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. > > Attention @JohnTortugo. > > Additional testing: > - [x] Failing fuzzer test (now passes) > - [x] Linux x86_64 fastdebug `tier1` Marked as reviewed by jiefu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4145 From kvn at openjdk.java.net Fri May 21 23:25:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 21 May 2021 23:25:56 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 04:26:00 GMT, David Holmes wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > lir_div_strictfp and lir_mul_strictfp Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From kvn at openjdk.java.net Fri May 21 23:31:13 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 21 May 2021 23:31:13 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v3] In-Reply-To: References: Message-ID: On Mon, 17 May 2021 08:12:46 GMT, Roland Westrelin wrote: >> Sinking data nodes out of a loop when all uses are out of a loop has >> several issues that this attempts to fix. >> >> 1- Only non control uses are considered which makes little sense (why >> not sink if the data node is an argument to a call or a returned >> value?) >> >> 2- Sinking of Loads is broken because of the handling of >> anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control >> in the loop because it takes all uses into account. >> >> 3- For data nodes for which a control edge can't be set, commoning of >> clones back in the loop is prevented with: >> _igvn._worklist.yank(x); >> which gives no guarantee >> >> This patch tries to address all issues: >> >> 1- it looks at all uses, not only non control uses >> >> 2- anti-dependences are computed for each use independently >> >> 3- Cast nodes are used to pin clones out of loop >> >> >> 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl() >> logic. While working on this, I noticed a bug in anti-dependence >> analysis: when the use is a cfg node, the code sometimes looks at uses >> of the memory state of the cfg. The logic uses the use of the cfg >> which is a projection of adr_type identical to the cfg. It should >> instead look at the use of the memory projection. >> >> The existing logic for sinking loads calls clear_dom_lca_tags() for >> every load which seems like quite a waste. I added a >> _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By >> incrementing _dom_lca_tags_round, new tags that don't conflict with >> existing ones are produced and there's no need for >> clear_dom_lca_tags(). >> >> For anti-dependence analysis to return a correct result, early control >> of the load is needed. The only way to get it at this stage, AFAICT, >> is to compute it by following the load's input until a pinned node is >> reached. >> >> The existing logic pins cloned nodes next to their use. The logic I >> propose pins them right out of the loop. This could possibly avoid >> some redundant clones. It also makes some special handling for corner >> cases with loop strip mining useless. >> >> For 3-, I added extra Cast nodes for float types. If a chain of data >> nodes are sunk, the new logic tries to keep a single Cast for the >> entire chain rather than one Cast per node. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Tobias' review > - Merge branch 'master' into JDK-8252372 > - CastVV > - Merge branch 'master' into JDK-8252372 > - extra comments > - fix Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3689 From jiefu at openjdk.java.net Sat May 22 00:02:05 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 22 May 2021 00:02:05 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: <8UuamCdBlg8NiUW9iCluoNTLJAxbgeS38y3C_50ZWyA=.ae70148a-c8cf-4b65-bb3b-ce1f095ea455@github.com> On Fri, 21 May 2021 18:16:23 GMT, Vladimir Ivanov wrote: > There are some suspicious failures on linux-x86 in pre-submit testing results: > > * compiler/c1/Test6855215.java > * compiler/intrinsics/string/TestStringLatin1IndexOfChar.java > > The tests explicitly specify `-XX:UseSSE=0`, so it may be related to the patch. Anybody interested in linux-x86 want to take a look? @shade @DamonFool compiler/c1/Test6855215.java crashes on x86_32 with the latest patch. # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/jdk/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp:683), pid=13254, tid=13270 # assert(op2->tmp1_opr()->is_fpu_register()) failed: strict operations need temporary fpu stack slot # # JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc..jdk) # Java VM: OpenJDK Server VM (fastdebug 17-internal+0-adhoc..jdk, mixed mode, sharing, tiered, g1 gc, linux-x86) # Problematic frame: # V [libjvm.so+0x64fcf0] FpuStackAllocator::handle_op2(LIR_Op2*)+0x120 # Current CompileTask: C1: 109 66 b 3 java.util.HashMap::resize (356 bytes) Stack: [0xa1f7f000,0xa2000000], sp=0xa1ffe540, free space=509k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x64fcf0] FpuStackAllocator::handle_op2(LIR_Op2*)+0x120 V [libjvm.so+0x65232c] FpuStackAllocator::allocate_block(BlockBegin*)+0x28c V [libjvm.so+0x6526af] FpuStackAllocator::allocate()+0xcf V [libjvm.so+0x652a4d] LinearScan::allocate_fpu_stack()+0xcd V [libjvm.so+0x648d7a] LinearScan::do_linear_scan()+0x39a V [libjvm.so+0x588f40] Compilation::emit_lir()+0xd10 V [libjvm.so+0x58ba60] Compilation::compile_java_method()+0x720 V [libjvm.so+0x58c287] Compilation::compile_method()+0x247 V [libjvm.so+0x58cbe3] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*)+0x3d3 V [libjvm.so+0x58de72] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1b2 V [libjvm.so+0x83f5fb] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xc6b V [libjvm.so+0x83ff71] CompileBroker::compiler_thread_loop()+0x491 V [libjvm.so+0x86b8c6] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x56 V [libjvm.so+0x14b28e2] JavaThread::thread_main_inner()+0x242 V [libjvm.so+0x14b794a] Thread::call_run()+0xfa V [libjvm.so+0x10fc9bf] thread_native_entry(Thread*)+0x11f C [libpthread.so.0+0x63bd] start_thread+0xfd compiler/intrinsics/string/TestStringLatin1IndexOfChar.java seems fine on my x86_32. I didn't take a further look at the crash since it's already weekend now. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From hshi at openjdk.java.net Sat May 22 01:42:02 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sat, 22 May 2021 01:42:02 GMT Subject: RFR: 8266528: Optimize C2 VerifyIterativeGVN execution time [v3] In-Reply-To: <182RYuy2ZX_C2cMYPB59-EC9ZzQ7Dx6nMOfqiTS5WLM=.83209a60-414c-4d4f-b1d4-d9c37fc001ab@github.com> References: <182RYuy2ZX_C2cMYPB59-EC9ZzQ7Dx6nMOfqiTS5WLM=.83209a60-414c-4d4f-b1d4-d9c37fc001ab@github.com> Message-ID: <6CoGk8SIq66NVkC6tjN1uqBiXHU9g_Rdh_SK204R0_g=.c3861797-d29d-4be8-973c-18437d6fd4dd@github.com> On Fri, 21 May 2021 08:02:38 GMT, Tobias Hartmann wrote: >> Hui Shi has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comments for duplicated input processing in Node::Verify > > Looks good to me too. @TobiHartmann Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From david.holmes at oracle.com Sat May 22 02:15:34 2021 From: david.holmes at oracle.com (David Holmes) Date: Sat, 22 May 2021 12:15:34 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: <3ba8220c-e21d-7bc3-9a7a-942c424af374@oracle.com> Hi Vladimir, On 22/05/2021 4:19 am, Vladimir Ivanov wrote: > On Tue, 18 May 2021 04:26:00 GMT, David Holmes wrote: > >>> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >>> >>> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >>> >>> Testing: tiers 1-3 >>> >>> Thanks, >>> David >> >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> lir_div_strictfp and lir_mul_strictfp > > There are some suspicious failures on linux-x86 in pre-submit testing results: > - compiler/c1/Test6855215.java > - compiler/intrinsics/string/TestStringLatin1IndexOfChar.java > > The tests explicitly specify `-XX:UseSSE=0`, so it may be related to the patch. Anybody interested in linux-x86 want to take a look? @shade @DamonFool I'll take a look at the patch again because it is supposed to involve no functional changes. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/3991 > From david.holmes at oracle.com Sat May 22 02:25:16 2021 From: david.holmes at oracle.com (David Holmes) Date: Sat, 22 May 2021 12:25:16 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: <3ba8220c-e21d-7bc3-9a7a-942c424af374@oracle.com> References: <3ba8220c-e21d-7bc3-9a7a-942c424af374@oracle.com> Message-ID: <7949a2bf-a26d-16c9-ce00-f514dc21edc2@oracle.com> Correction ... On 22/05/2021 12:15 pm, David Holmes wrote: > Hi Vladimir, > > On 22/05/2021 4:19 am, Vladimir Ivanov wrote: >> On Tue, 18 May 2021 04:26:00 GMT, David Holmes >> wrote: >> >>>> As part of JEP 306, the vestiges of HotSpot support for non-strict >>>> floating-point execution can be removed. All methods implicitly have >>>> strictfp semantics so the explicit checks for is_strict() can be >>>> replaced by true and the code reformulated accordingly. >>>> >>>> There are still some names that include "strict" that could >>>> potentially be renamed to remove it, but the fact we have to have >>>> strict fp semantics is still important on some platforms, so the >>>> names help reinforce that IMO. >>>> >>>> Testing: tiers 1-3 >>>> >>>> Thanks, >>>> David >>> >>> David Holmes has updated the pull request incrementally with one >>> additional commit since the last revision: >>> >>> ?? lir_div_strictfp and lir_mul_strictfp >> >> There are some suspicious failures on linux-x86 in pre-submit testing >> results: >> - compiler/c1/Test6855215.java >> - compiler/intrinsics/string/TestStringLatin1IndexOfChar.java >> >> The tests explicitly specify `-XX:UseSSE=0`, so it may be related to >> the patch. Anybody interested in linux-x86 want to take a look? @shade >> @DamonFool > > I'll take a look at the patch again because it is supposed to involve no > functional changes. That isn't actually true. We are now forcing 32-bit x86 to always have strict semantics, where previously they could use non-strict. David > Thanks, > David > >> ------------- >> >> PR: https://git.openjdk.java.net/jdk/pull/3991 >> From aph at redhat.com Sat May 22 08:08:16 2021 From: aph at redhat.com (Andrew Haley) Date: Sat, 22 May 2021 09:08:16 +0100 Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics In-Reply-To: References: <72216fcc-67e7-c700-8fee-2d8c752a0f0c@redhat.com> <8346BF97-D8F6-4521-8589-66C679618DB7@oracle.com> <6AA63D23-E2C1-4510-8FCC-2B17FFF3465E@oracle.com> <2e4b6fb0-e53e-43d3-b680-b50a53cfe04a@redhat.com> <0d8f2f93-0f83-5771-ff94-3f6283d25c21@redhat.com> Message-ID: <9f671f39-f5fe-1b04-c46b-67f86c76da5e@redhat.com> On 5/21/21 7:34 PM, Viswanathan, Sandhya wrote: > We made this contribution with the goal to help Vector API and its evaluation during incubation. This is the best we could do currently towards JDK 17. > Please advice if you think that PR be withdrawn instead of integration at this point. We will go with your expert advice. I think the general consensus is to go ahead with this patch. We should take the meta-discussion elsewhere. Thank you for your patience. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ddong at openjdk.java.net Sat May 22 09:13:05 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sat, 22 May 2021 09:13:05 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v11] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 19:36:37 GMT, Vladimir Kozlov wrote: > Good. > What testing you did? fastdebug on Linux x86_64 jtreg hotspot/jtreg/compiler jdk/jdk/jfr/ ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From hshi at openjdk.java.net Sat May 22 11:55:03 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sat, 22 May 2021 11:55:03 GMT Subject: Integrated: 8266528: Optimize C2 VerifyIterativeGVN execution time In-Reply-To: References: Message-ID: On Mon, 17 May 2021 05:23:12 GMT, Hui Shi wrote: > Please help review this enhancement for VerifyIterativeGVN, reduce about 3x - 200x executime time when VerifyIterativeGVN is on. > > In simple test "-Xcomp -XX:+VerifyIterativeGVN -XX:-TieredCompilation -version", time reduced from 8.67s to 2.4s. > In extreme case hotspot/test/jtreg/compiler/escapeAnalysis/Test6689060.java, time reduced from 20000s to 95s. > > Test with "-Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation", tier1/2/3 with fastdebug and no regression. > > 1. Remove node_arena()->contains checking for verifing nodes. _verify_window is reset before every PhaseIterGVN::optimize. Searching from root or nodes in _verify_window will not meet nodes whose _idx is not unique (PhaseIterGVN::optimize is not triggered in the middle of PhaseRenumberLive ). Assertion every node is in current node_arena() in Node::verify, passes tier1/2/3 checks (with -Xbatch -XX:+VerifyIterativeGVN -XX:-TieredCompilation), no assertion failure happens. > > 2. Combine verification for nodes in _verify_window into one worklist and skipping redundant nodes in _verify_window. > > 3. Optimize duplicate checking for same input nodes, skipping if current input index is not its first occurence. > > 4. Optimize field access: Replace "n->in(j)" with "n->_in[j]", same with outcnt calucation for input node x. This pull request has now been integrated. Changeset: 4023646e Author: Hui Shi Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/4023646ed1bcb821b1d18f7e5104f04995e8171d Stats: 45 lines in 4 files changed: 21 ins; 10 del; 14 mod 8266528: Optimize C2 VerifyIterativeGVN execution time Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4045 From vlivanov at openjdk.java.net Sat May 22 13:51:57 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Sat, 22 May 2021 13:51:57 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: On Tue, 18 May 2021 04:26:00 GMT, David Holmes wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > lir_div_strictfp and lir_mul_strictfp Thanks, Jie. I think there's `result->is_double_fpu()` check missing in `FpuStackAllocator::handle_op2`: case lir_mul: case lir_div: { assert(op2->tmp1_opr()->is_fpu_register(), "strict operations need temporary fpu stack slot"); insert_free_if_dead(op2->tmp1_opr()); assert(sim()->stack_size() <= 7, "at least one stack slot must be free"); // fall-through: continue with the normal handling of lir_mul and lir_div } The code should be guarded by `result->is_double_fpu()` since special handling (additional temp operand) is needed only for `mul`/`div` on doubles. src/hotspot/share/c1/c1_GraphBuilder.cpp line 1126: > 1124: Value y = pop(type); > 1125: Value x = pop(type); > 1126: Value res = new ArithmeticOp(code, x, y,state_before); Please, put a space before `state_before`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From shade at openjdk.java.net Sat May 22 15:19:01 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sat, 22 May 2021 15:19:01 GMT Subject: RFR: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: On Fri, 21 May 2021 12:49:10 GMT, Aleksey Shipilev wrote: > See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. > > Attention @JohnTortugo. > > Additional testing: > - [x] Failing fuzzer test (now passes) > - [x] Linux x86_64 fastdebug `tier1` Thanks all, I'll integrate to hopefully make weekend fuzzer CI runs clean. ------------- PR: https://git.openjdk.java.net/jdk/pull/4145 From shade at openjdk.java.net Sat May 22 15:19:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sat, 22 May 2021 15:19:02 GMT Subject: Integrated: 8267531: [x86] Assembler::andb(Address, Register) encoding is incorrect In-Reply-To: References: Message-ID: On Fri, 21 May 2021 12:49:10 GMT, Aleksey Shipilev wrote: > See the bug report to see the way we arrived here. I looked through the [breakage changeset](https://github.com/openjdk/jdk/commit/de784312c340b4a4f4c4d11854bfbe9e9e826ea3), and I think that is the only "*b" case that is missing. > > Attention @JohnTortugo. > > Additional testing: > - [x] Failing fuzzer test (now passes) > - [x] Linux x86_64 fastdebug `tier1` This pull request has now been integrated. Changeset: 71e2fa25 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/71e2fa25f73b0006a024edb59d79d837227ecd40 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8267531: [x86] Assembler::andb(Address,Register) encoding is incorrect Reviewed-by: azeemj, vlivanov, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/4145 From david.holmes at oracle.com Sun May 23 00:02:41 2021 From: david.holmes at oracle.com (David Holmes) Date: Sun, 23 May 2021 10:02:41 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: On 22/05/2021 11:51 pm, Vladimir Ivanov wrote: > On Tue, 18 May 2021 04:26:00 GMT, David Holmes wrote: > >>> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >>> >>> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >>> >>> Testing: tiers 1-3 >>> >>> Thanks, >>> David >> >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> lir_div_strictfp and lir_mul_strictfp > > Thanks, Jie. > > I think there's `result->is_double_fpu()` check missing in `FpuStackAllocator::handle_op2`: > > case lir_mul: > case lir_div: { > assert(op2->tmp1_opr()->is_fpu_register(), "strict operations need temporary fpu stack slot"); > insert_free_if_dead(op2->tmp1_opr()); > assert(sim()->stack_size() <= 7, "at least one stack slot must be free"); > // fall-through: continue with the normal handling of lir_mul and lir_div > } > > The code should be guarded by `result->is_double_fpu()` since special handling (additional temp operand) is needed only for `mul`/`div` on doubles. Where was that guard on the existing strict version? This failure is suggesting to me that there may be a pre-existing bug in the strict/non-strict code on x86. I need to check exactly what cases the failing test is failing on so that I can see which code change is in play. I have a suspicion but can't check it yet. > src/hotspot/share/c1/c1_GraphBuilder.cpp line 1126: > >> 1124: Value y = pop(type); >> 1125: Value x = pop(type); >> 1126: Value res = new ArithmeticOp(code, x, y,state_before); > > Please, put a space before `state_before`. Will do. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/3991 > From dholmes at openjdk.java.net Sun May 23 00:09:20 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 23 May 2021 00:09:20 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v3] In-Reply-To: References: Message-ID: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David David Holmes has updated the pull request incrementally with one additional commit since the last revision: Add missing space ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3991/files - new: https://git.openjdk.java.net/jdk/pull/3991/files/c0c35a77..3ddd6330 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3991.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3991/head:pull/3991 PR: https://git.openjdk.java.net/jdk/pull/3991 From vladimir.x.ivanov at oracle.com Sun May 23 14:13:10 2021 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sun, 23 May 2021 17:13:10 +0300 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: References: Message-ID: <9f7d558e-8d2d-67d1-a19c-f0c0df7ab3a5@oracle.com> >> I think there's `result->is_double_fpu()` check missing in >> `FpuStackAllocator::handle_op2`: >> >> ????? case lir_mul: >> ????? case lir_div: { >> ??????? assert(op2->tmp1_opr()->is_fpu_register(), "strict operations >> need temporary fpu stack slot"); >> ??????? insert_free_if_dead(op2->tmp1_opr()); >> ??????? assert(sim()->stack_size() <= 7, "at least one stack slot must >> be free"); >> ??????? // fall-through: continue with the normal handling of lir_mul >> and lir_div >> ????? } >> >> The code should be guarded by `result->is_double_fpu()` since special >> handling (additional temp operand) is needed only for `mul`/`div` on >> doubles. > > Where was that guard on the existing strict version? > > This failure is suggesting to me that there may be a pre-existing bug in > the strict/non-strict code on x86. I need to check exactly what cases > the failing test is failing on so that I can see which code change is in > play. I have a suspicion but can't check it yet. The guard was implicit: `lir_mul_strict`/`lir_div_strict` operations were always instantiated with the temp FPU operand. Since they are removed now, there's aliasing occurring between `lir_mul`/`lir_div` on doubles (`lir_mul_strict`/`lir_div_strict` before) and original operations on ints/longs/floats. So, the check has to be explicit now. Only ddiv/dmul implementations need a temp operand. Best regards, Vladimir Ivanov From david.holmes at oracle.com Sun May 23 22:59:09 2021 From: david.holmes at oracle.com (David Holmes) Date: Mon, 24 May 2021 08:59:09 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v2] In-Reply-To: <9f7d558e-8d2d-67d1-a19c-f0c0df7ab3a5@oracle.com> References: <9f7d558e-8d2d-67d1-a19c-f0c0df7ab3a5@oracle.com> Message-ID: <0e418bab-893f-f4b8-243b-b4e84a621cbb@oracle.com> On 24/05/2021 12:13 am, Vladimir Ivanov wrote: >>> I think there's `result->is_double_fpu()` check missing in >>> `FpuStackAllocator::handle_op2`: >>> >>> ????? case lir_mul: >>> ????? case lir_div: { >>> ??????? assert(op2->tmp1_opr()->is_fpu_register(), "strict operations >>> need temporary fpu stack slot"); >>> ??????? insert_free_if_dead(op2->tmp1_opr()); >>> ??????? assert(sim()->stack_size() <= 7, "at least one stack slot >>> must be free"); >>> ??????? // fall-through: continue with the normal handling of lir_mul >>> and lir_div >>> ????? } >>> >>> The code should be guarded by `result->is_double_fpu()` since special >>> handling (additional temp operand) is needed only for `mul`/`div` on >>> doubles. >> >> Where was that guard on the existing strict version? >> >> This failure is suggesting to me that there may be a pre-existing bug >> in the strict/non-strict code on x86. I need to check exactly what >> cases the failing test is failing on so that I can see which code >> change is in play. I have a suspicion but can't check it yet. > > The guard was implicit: `lir_mul_strict`/`lir_div_strict` operations > were always instantiated with the temp FPU operand. Since they are > removed now, there's aliasing occurring between `lir_mul`/`lir_div` on > doubles (`lir_mul_strict`/`lir_div_strict` before) and original > operations on ints/longs/floats. Right I just tracked that through before seeing this email. The fmul uses the non-tmp version but later all lir_mul are assumed the same. Thanks, David > So, the check has to be explicit now. Only ddiv/dmul implementations > need a temp operand. > > Best regards, > Vladimir Ivanov From dholmes at openjdk.java.net Sun May 23 23:09:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 23 May 2021 23:09:39 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v4] In-Reply-To: References: Message-ID: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David David Holmes has updated the pull request incrementally with one additional commit since the last revision: The code for strict handling only applies to doubles. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3991/files - new: https://git.openjdk.java.net/jdk/pull/3991/files/3ddd6330..89e90058 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=02-03 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/3991.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3991/head:pull/3991 PR: https://git.openjdk.java.net/jdk/pull/3991 From dholmes at openjdk.java.net Sun May 23 23:14:08 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 23 May 2021 23:14:08 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v5] In-Reply-To: References: Message-ID: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into jep-306 - The code for strict handling only applies to doubles. - Add missing space - lir_div_strictfp and lir_mul_strictfp - Removed divDPR_reg_round as it has a false predicate and so is now unused - Revert classFileParser changes as they will be handled by JDK-8266530 - 8266530: HotSpot changes for JEP 306 All methods are now implicitly strictfp so all code generation etc uses the strict form. There are still some names that include "strict" that could potentially be renamed to rmeove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3991/files - new: https://git.openjdk.java.net/jdk/pull/3991/files/89e90058..4dcab9b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=03-04 Stats: 29911 lines in 904 files changed: 14007 ins; 12146 del; 3758 mod Patch: https://git.openjdk.java.net/jdk/pull/3991.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3991/head:pull/3991 PR: https://git.openjdk.java.net/jdk/pull/3991 From jiefu at openjdk.java.net Sun May 23 23:42:14 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 23 May 2021 23:42:14 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: <0e418bab-893f-f4b8-243b-b4e84a621cbb@oracle.com> References: <0e418bab-893f-f4b8-243b-b4e84a621cbb@oracle.com> Message-ID: On Sun, 23 May 2021 23:00:34 GMT, David Holmes wrote: > Right I just tracked that through before seeing this email. The fmul > uses the non-tmp version but later all lir_mul are assumed the same. compiler/c1/Test6855215.java passed on x86_32 with the latest version. More testing on x86_32 is in progress. Will let you know once finished. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From david.holmes at oracle.com Mon May 24 00:38:12 2021 From: david.holmes at oracle.com (David Holmes) Date: Mon, 24 May 2021 10:38:12 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: References: <0e418bab-893f-f4b8-243b-b4e84a621cbb@oracle.com> Message-ID: <2567ba8c-c37c-89ce-54de-ef4844ca488b@oracle.com> On 24/05/2021 9:42 am, Jie Fu wrote: > On Sun, 23 May 2021 23:00:34 GMT, David Holmes wrote: > >> Right I just tracked that through before seeing this email. The fmul >> uses the non-tmp version but later all lir_mul are assumed the same. > > compiler/c1/Test6855215.java passed on x86_32 with the latest version. > > More testing on x86_32 is in progress. > Will let you know once finished. > Thanks. Thanks Jie - much appreciated! This won't be pushed till early next week, so plenty of time for additional testing. David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/3991 > From jbhateja at openjdk.java.net Mon May 24 05:50:44 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 24 May 2021 05:50:44 GMT Subject: RFR: 8266054: VectorAPI rotate operation optimization [v7] In-Reply-To: References: Message-ID: > Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- > > vec1 = lanewise(VectorOperators.LSHL, n) > vec2 = lanewise(VectorOperators.LSHR, n) > res = lanewise(VectorOperations.OR, vec1 , vec2) > > This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction. > > AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. > > Please find below the performance data for included JMH benchmark. > Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > Benchmark | (TESTSIZE) | Shift | Baseline AVX3 (ops/ms) | Withopt? AVX3 (ops/ms) | Gain % | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain % > -- | -- | -- | -- | -- | -- | -- | -- | -- > ? | ? | ? | ? | ? | ? | ? | ? | ? > RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 17223.35 | 17094.69 | -0.75 | 17008.32 | 17488.06 | 2.82 > RotateBenchmark.testRotateLeftB | 128.00 | 7.00 | 8944.98 | 8811.34 | -1.49 | 8878.17 | 9218.68 | 3.84 > RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 17195.75 | 17137.32 | -0.34 | 16789.01 | 17780.34 | 5.90 > RotateBenchmark.testRotateLeftB | 128.00 | 15.00 | 9052.67 | 8838.60 | -2.36 | 8814.62 | 9206.01 | 4.44 > RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 17100.19 | 16950.64 | -0.87 | 16827.73 | 17720.37 | 5.30 > RotateBenchmark.testRotateLeftB | 128.00 | 31.00 | 9079.95 | 8471.26 | -6.70 | 8888.44 | 9167.68 | 3.14 > RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 21231.33 | 21513.08 | 1.33 | 21824.51 | 21479.48 | -1.58 > RotateBenchmark.testRotateLeftB | 256.00 | 7.00 | 11103.62 | 11180.16 | 0.69 | 11173.67 | 11529.22 | 3.18 > RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 21119.14 | 21552.04 | 2.05 | 21693.05 | 21915.37 | 1.02 > RotateBenchmark.testRotateLeftB | 256.00 | 15.00 | 11048.68 | 11094.20 | 0.41 | 11049.90 | 11439.07 | 3.52 > RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 21506.31 | 21391.41 | -0.53 | 21263.18 | 21986.29 | 3.40 > RotateBenchmark.testRotateLeftB | 256.00 | 31.00 | 11056.12 | 11232.78 | 1.60 | 10941.59 | 11397.09 | 4.16 > RotateBenchmark.testRotateLeftB | 512.00 | 7.00 | 17976.56 | 18180.85 | 1.14 | 1212.26 | 2533.34 | 108.98 > RotateBenchmark.testRotateLeftB | 512.00 | 15.00 | 17553.70 | 18219.07 | 3.79 | 1256.73 | 2537.41 | 101.91 > RotateBenchmark.testRotateLeftB | 512.00 | 31.00 | 17618.03 | 17738.15 | 0.68 | 1214.69 | 2533.83 | 108.60 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 7258.87 | 7468.88 | 2.89 | 7115.12 | 7117.26 | 0.03 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 3586.65 | 3950.85 | 10.15 | 3532.17 | 3595.80 | 1.80 > RotateBenchmark.testRotateLeftI | 128.00 | 7.00 | 1835.07 | 1999.68 | 8.97 | 1789.90 | 1819.93 | 1.68 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 7273.36 | 7410.91 | 1.89 | 7198.60 | 6994.79 | -2.83 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 3674.98 | 3926.27 | 6.84 | 3549.90 | 3755.09 | 5.78 > RotateBenchmark.testRotateLeftI | 128.00 | 15.00 | 1840.94 | 1882.25 | 2.24 | 1801.56 | 1872.89 | 3.96 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 7457.11 | 7361.48 | -1.28 | 6975.33 | 7385.94 | 5.89 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 3570.74 | 3929.30 | 10.04 | 3635.37 | 3736.67 | 2.79 > RotateBenchmark.testRotateLeftI | 128.00 | 31.00 | 1902.32 | 1960.46 | 3.06 | 1812.32 | 1813.88 | 0.09 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 11174.24 | 12044.52 | 7.79 | 11509.87 | 11273.44 | -2.05 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 5981.47 | 6073.70 | 1.54 | 5593.66 | 5661.93 | 1.22 > RotateBenchmark.testRotateLeftI | 256.00 | 7.00 | 2932.49 | 3069.54 | 4.67 | 2950.86 | 2892.42 | -1.98 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 11764.11 | 12098.63 | 2.84 | 11069.52 | 11476.93 | 3.68 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 5855.20 | 6080.40 | 3.85 | 5919.11 | 5607.04 | -5.27 > RotateBenchmark.testRotateLeftI | 256.00 | 15.00 | 2989.05 | 3048.56 | 1.99 | 2902.63 | 2821.83 | -2.78 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 11652.84 | 11965.40 | 2.68 | 11525.62 | 11459.83 | -0.57 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 5851.82 | 6164.94 | 5.35 | 5882.60 | 5842.30 | -0.69 > RotateBenchmark.testRotateLeftI | 256.00 | 31.00 | 3015.99 | 3043.79 | 0.92 | 2963.71 | 2947.97 | -0.53 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 16029.15 | 16189.79 | 1.00 | 860.43 | 2339.32 | 171.88 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 8078.25 | 8081.84 | 0.04 | 427.39 | 1147.92 | 168.59 > RotateBenchmark.testRotateLeftI | 512.00 | 7.00 | 4021.49 | 4294.03 | 6.78 | 209.25 | 582.28 | 178.27 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 15912.98 | 16329.03 | 2.61 | 848.23 | 2296.78 | 170.77 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 8054.10 | 8306.37 | 3.13 | 429.93 | 1146.90 | 166.77 > RotateBenchmark.testRotateLeftI | 512.00 | 15.00 | 4102.58 | 4071.08 | -0.77 | 217.86 | 582.20 | 167.24 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 16177.79 | 16287.85 | 0.68 | 857.84 | 2243.15 | 161.49 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 8187.47 | 8410.48 | 2.72 | 434.60 | 1128.20 | 159.60 > RotateBenchmark.testRotateLeftI | 512.00 | 31.00 | 4109.15 | 4233.80 | 3.03 | 208.71 | 572.43 | 174.27 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 3755.09 | 3930.29 | 4.67 | 3604.19 | 3598.47 | -0.16 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 1829.03 | 1957.39 | 7.02 | 1833.95 | 1808.38 | -1.39 > RotateBenchmark.testRotateLeftL | 128.00 | 7.00 | 915.35 | 970.55 | 6.03 | 916.25 | 899.08 | -1.87 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 3664.85 | 3812.26 | 4.02 | 3629.37 | 3579.23 | -1.38 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 1829.51 | 1877.76 | 2.64 | 1781.05 | 1807.57 | 1.49 > RotateBenchmark.testRotateLeftL | 128.00 | 15.00 | 913.37 | 953.42 | 4.38 | 912.26 | 908.73 | -0.39 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 3648.45 | 3899.20 | 6.87 | 3552.67 | 3581.04 | 0.80 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 1816.50 | 1959.68 | 7.88 | 1820.88 | 1819.71 | -0.06 > RotateBenchmark.testRotateLeftL | 128.00 | 31.00 | 901.05 | 955.13 | 6.00 | 913.74 | 907.90 | -0.64 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 5850.99 | 6108.64 | 4.40 | 5882.65 | 5755.21 | -2.17 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 2962.21 | 3060.47 | 3.32 | 2955.20 | 2909.18 | -1.56 > RotateBenchmark.testRotateLeftL | 256.00 | 7.00 | 1480.46 | 1534.72 | 3.66 | 1467.78 | 1430.60 | -2.53 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 5858.23 | 6047.51 | 3.23 | 5770.02 | 5773.19 | 0.05 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 2951.49 | 3096.53 | 4.91 | 2885.21 | 2899.31 | 0.49 > RotateBenchmark.testRotateLeftL | 256.00 | 15.00 | 1486.26 | 1527.94 | 2.80 | 1441.93 | 1454.25 | 0.85 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 5873.21 | 6089.75 | 3.69 | 5767.58 | 5664.11 | -1.79 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 2969.67 | 3081.39 | 3.76 | 2878.50 | 2905.86 | 0.95 > RotateBenchmark.testRotateLeftL | 256.00 | 31.00 | 1452.21 | 1520.03 | 4.67 | 1430.30 | 1485.63 | 3.87 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 8088.65 | 8443.63 | 4.39 | 455.67 | 1226.33 | 169.13 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 4011.95 | 4120.25 | 2.70 | 229.77 | 619.87 | 169.77 > RotateBenchmark.testRotateLeftL | 512.00 | 7.00 | 2090.57 | 2109.53 | 0.91 | 115.21 | 310.36 | 169.37 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 8166.84 | 8557.28 | 4.78 | 457.67 | 1242.86 | 171.56 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 4137.02 | 4287.95 | 3.65 | 227.26 | 624.80 | 174.93 > RotateBenchmark.testRotateLeftL | 512.00 | 15.00 | 2095.01 | 2102.86 | 0.37 | 114.26 | 310.83 | 172.03 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 8082.68 | 8400.56 | 3.93 | 459.59 | 1230.07 | 167.64 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 4047.67 | 4147.58 | 2.47 | 229.01 | 606.38 | 164.78 > RotateBenchmark.testRotateLeftL | 512.00 | 31.00 | 2086.83 | 2126.72 | 1.91 | 111.93 | 305.66 | 173.08 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 13597.19 | 13255.09 | -2.52 | 13818.39 | 13242.40 | -4.17 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 7028.26 | 6826.59 | -2.87 | 6765.15 | 6907.87 | 2.11 > RotateBenchmark.testRotateLeftS | 128.00 | 7.00 | 3570.40 | 3468.01 | -2.87 | 3449.66 | 3533.50 | 2.43 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 13615.99 | 13464.40 | -1.11 | 13330.02 | 13870.57 | 4.06 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 7043.31 | 6763.34 | -3.97 | 6928.88 | 7063.57 | 1.94 > RotateBenchmark.testRotateLeftS | 128.00 | 15.00 | 3495.12 | 3537.62 | 1.22 | 3503.41 | 3457.67 | -1.31 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 13591.66 | 13665.84 | 0.55 | 13773.27 | 13126.08 | -4.70 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 7027.08 | 7011.24 | -0.23 | 6974.98 | 6815.50 | -2.29 > RotateBenchmark.testRotateLeftS | 128.00 | 31.00 | 3568.28 | 3569.62 | 0.04 | 3580.67 | 3463.58 | -3.27 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 21154.03 | 21416.32 | 1.24 | 21187.01 | 21401.61 | 1.01 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 11194.24 | 10865.47 | -2.94 | 11063.19 | 10977.60 | -0.77 > RotateBenchmark.testRotateLeftS | 256.00 | 7.00 | 5797.80 | 5523.94 | -4.72 | 5654.63 | 5468.78 | -3.29 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 21333.89 | 21412.74 | 0.37 | 21610.94 | 20908.96 | -3.25 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 11327.07 | 11113.48 | -1.89 | 11148.25 | 10678.14 | -4.22 > RotateBenchmark.testRotateLeftS | 256.00 | 15.00 | 5810.69 | 5569.72 | -4.15 | 5663.26 | 5618.87 | -0.78 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 21753.20 | 21198.43 | -2.55 | 21567.90 | 21929.81 | 1.68 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 11517.08 | 11039.64 | -4.15 | 11103.08 | 10871.59 | -2.08 > RotateBenchmark.testRotateLeftS | 256.00 | 31.00 | 5897.16 | 5606.75 | -4.92 | 5459.87 | 5604.12 | 2.64 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 29748.53 | 28883.73 | -2.91 | 1549.02 | 3928.53 | 153.61 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 15197.09 | 15878.19 | 4.48 | 772.59 | 1924.35 | 149.08 > RotateBenchmark.testRotateLeftS | 512.00 | 7.00 | 8046.30 | 8081.19 | 0.43 | 388.11 | 990.28 | 155.16 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 30618.04 | 29419.19 | -3.92 | 1524.22 | 3915.97 | 156.92 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 15854.43 | 15846.37 | -0.05 | 766.09 | 1953.60 | 155.01 > RotateBenchmark.testRotateLeftS | 512.00 | 15.00 | 7814.77 | 7899.30 | 1.08 | 390.82 | 970.37 | 148.29 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 29596.82 | 28538.69 | -3.58 | 1530.45 | 3906.91 | 155.28 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 15662.48 | 15849.25 | 1.19 | 778.08 | 1934.31 | 148.60 > RotateBenchmark.testRotateLeftS | 512.00 | 31.00 | 8121.14 | 7758.59 | -4.46 | 392.78 | 959.73 | 144.34 > RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 17465.84 | 17069.34 | -2.27 | 16849.73 | 17842.08 | 5.89 > RotateBenchmark.testRotateRightB | 128.00 | 7.00 | 9049.19 | 8864.15 | -2.04 | 8786.67 | 9105.34 | 3.63 > RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 17703.38 | 17070.98 | -3.57 | 16595.85 | 17784.68 | 7.16 > RotateBenchmark.testRotateRightB | 128.00 | 15.00 | 9007.68 | 8817.41 | -2.11 | 8704.49 | 9185.87 | 5.53 > RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 17531.05 | 16983.40 | -3.12 | 16947.69 | 17655.40 | 4.18 > RotateBenchmark.testRotateRightB | 128.00 | 31.00 | 8986.30 | 8794.15 | -2.14 | 8816.62 | 9225.95 | 4.64 > RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 21293.95 | 21506.74 | 1.00 | 21163.29 | 21854.03 | 3.26 > RotateBenchmark.testRotateRightB | 256.00 | 7.00 | 11258.47 | 11072.92 | -1.65 | 11118.12 | 11338.96 | 1.99 > RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 21253.36 | 21292.37 | 0.18 | 21224.39 | 21763.88 | 2.54 > RotateBenchmark.testRotateRightB | 256.00 | 15.00 | 11064.80 | 11198.35 | 1.21 | 10960.98 | 11294.14 | 3.04 > RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 21358.14 | 21346.21 | -0.06 | 21487.25 | 21854.42 | 1.71 > RotateBenchmark.testRotateRightB | 256.00 | 31.00 | 11045.61 | 11208.26 | 1.47 | 10907.03 | 11415.18 | 4.66 > RotateBenchmark.testRotateRightB | 512.00 | 7.00 | 17898.61 | 18307.54 | 2.28 | 1214.65 | 2546.64 | 109.66 > RotateBenchmark.testRotateRightB | 512.00 | 15.00 | 17909.25 | 18242.51 | 1.86 | 1215.05 | 2563.98 | 111.02 > RotateBenchmark.testRotateRightB | 512.00 | 31.00 | 17883.35 | 17928.44 | 0.25 | 1220.77 | 2543.30 | 108.34 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 7139.97 | 7626.72 | 6.82 | 6994.86 | 7075.65 | 1.15 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 3657.37 | 3898.34 | 6.59 | 3617.06 | 3576.12 | -1.13 > RotateBenchmark.testRotateRightI | 128.00 | 7.00 | 1804.26 | 1969.19 | 9.14 | 1796.62 | 1858.84 | 3.46 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 7404.31 | 7760.09 | 4.80 | 7036.77 | 7401.52 | 5.18 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 3600.52 | 3956.35 | 9.88 | 3595.28 | 3560.36 | -0.97 > RotateBenchmark.testRotateRightI | 128.00 | 15.00 | 1813.32 | 1966.41 | 8.44 | 1839.95 | 1852.53 | 0.68 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 7118.48 | 7724.81 | 8.52 | 7151.56 | 7021.09 | -1.82 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 3529.70 | 3881.63 | 9.97 | 3623.08 | 3601.01 | -0.61 > RotateBenchmark.testRotateRightI | 128.00 | 31.00 | 1823.61 | 1961.34 | 7.55 | 1786.86 | 1748.85 | -2.13 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 11697.98 | 11835.25 | 1.17 | 11513.16 | 11184.87 | -2.85 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 5890.11 | 6102.57 | 3.61 | 5658.79 | 5696.08 | 0.66 > RotateBenchmark.testRotateRightI | 256.00 | 7.00 | 2964.94 | 3070.26 | 3.55 | 2945.00 | 2962.08 | 0.58 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 11562.51 | 12151.29 | 5.09 | 11404.17 | 11120.28 | -2.49 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 5702.93 | 6130.57 | 7.50 | 5799.54 | 5779.08 | -0.35 > RotateBenchmark.testRotateRightI | 256.00 | 15.00 | 2861.96 | 3051.44 | 6.62 | 2943.99 | 2860.65 | -2.83 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 11203.13 | 11710.59 | 4.53 | 11363.18 | 11112.16 | -2.21 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 5893.97 | 6070.71 | 3.00 | 5776.67 | 5648.84 | -2.21 > RotateBenchmark.testRotateRightI | 256.00 | 31.00 | 2971.83 | 3046.76 | 2.52 | 2903.35 | 2833.88 | -2.39 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 16064.71 | 15851.35 | -1.33 | 861.93 | 2256.88 | 161.84 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 7916.80 | 8462.65 | 6.89 | 430.23 | 1147.30 | 166.67 > RotateBenchmark.testRotateRightI | 512.00 | 7.00 | 4104.64 | 4068.28 | -0.89 | 216.30 | 572.86 | 164.84 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 16133.09 | 16281.59 | 0.92 | 856.36 | 2229.58 | 160.35 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 8127.26 | 8117.59 | -0.12 | 419.16 | 1176.42 | 180.66 > RotateBenchmark.testRotateRightI | 512.00 | 15.00 | 4080.11 | 4063.26 | -0.41 | 218.32 | 571.93 | 161.97 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 15834.26 | 16314.64 | 3.03 | 865.96 | 2297.74 | 165.34 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 7965.62 | 8270.48 | 3.83 | 428.55 | 1148.87 | 168.08 > RotateBenchmark.testRotateRightI | 512.00 | 31.00 | 4161.69 | 4034.76 | -3.05 | 215.63 | 570.19 | 164.43 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 3556.70 | 3877.08 | 9.01 | 3596.46 | 3558.32 | -1.06 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 1772.93 | 1993.86 | 12.46 | 1856.79 | 1783.22 | -3.96 > RotateBenchmark.testRotateRightL | 128.00 | 7.00 | 908.66 | 1000.37 | 10.09 | 944.79 | 922.91 | -2.32 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 3742.44 | 3748.41 | 0.16 | 3788.07 | 3570.67 | -5.74 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 1817.53 | 1985.69 | 9.25 | 1892.38 | 1833.16 | -3.13 > RotateBenchmark.testRotateRightL | 128.00 | 15.00 | 941.03 | 952.68 | 1.24 | 915.79 | 910.21 | -0.61 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 3649.48 | 3896.56 | 6.77 | 3637.59 | 3557.53 | -2.20 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 1840.12 | 1997.19 | 8.54 | 1821.47 | 1799.82 | -1.19 > RotateBenchmark.testRotateRightL | 128.00 | 31.00 | 901.33 | 995.67 | 10.47 | 909.20 | 902.73 | -0.71 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 5789.93 | 5960.54 | 2.95 | 5758.14 | 5736.30 | -0.38 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 2963.20 | 3063.30 | 3.38 | 2943.48 | 2833.84 | -3.72 > RotateBenchmark.testRotateRightL | 256.00 | 7.00 | 1501.81 | 1510.23 | 0.56 | 1463.85 | 1462.26 | -0.11 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 5870.05 | 5951.43 | 1.39 | 5794.74 | 5604.58 | -3.28 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 2971.36 | 3047.00 | 2.55 | 2931.19 | 2907.30 | -0.82 > RotateBenchmark.testRotateRightL | 256.00 | 15.00 | 1473.97 | 1530.54 | 3.84 | 1473.45 | 1442.40 | -2.11 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 5858.08 | 6080.49 | 3.80 | 5863.69 | 5549.85 | -5.35 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 2916.24 | 3045.77 | 4.44 | 2981.59 | 2815.07 | -5.58 > RotateBenchmark.testRotateRightL | 256.00 | 31.00 | 1441.20 | 1531.56 | 6.27 | 1492.47 | 1473.25 | -1.29 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 8147.24 | 8310.05 | 2.00 | 469.45 | 1235.21 | 163.12 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 4142.95 | 4258.86 | 2.80 | 234.14 | 615.52 | 162.88 > RotateBenchmark.testRotateRightL | 512.00 | 7.00 | 2095.48 | 2087.20 | -0.40 | 113.55 | 311.19 | 174.05 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 8222.94 | 8246.58 | 0.29 | 458.91 | 1244.32 | 171.15 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 4160.04 | 4226.46 | 1.60 | 227.78 | 625.38 | 174.56 > RotateBenchmark.testRotateRightL | 512.00 | 15.00 | 2064.63 | 2162.44 | 4.74 | 113.27 | 314.15 | 177.36 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 8157.94 | 8466.90 | 3.79 | 450.26 | 1221.90 | 171.37 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 4039.74 | 4283.33 | 6.03 | 224.82 | 612.68 | 172.53 > RotateBenchmark.testRotateRightL | 512.00 | 31.00 | 2066.88 | 2147.51 | 3.90 | 110.97 | 303.43 | 173.42 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 13548.39 | 13245.87 | -2.23 | 13490.93 | 13084.76 | -3.01 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 7020.16 | 6768.85 | -3.58 | 6991.39 | 7044.32 | 0.76 > RotateBenchmark.testRotateRightS | 128.00 | 7.00 | 3550.50 | 3505.19 | -1.28 | 3507.12 | 3612.86 | 3.01 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 13743.43 | 13325.44 | -3.04 | 13696.15 | 13255.80 | -3.22 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 6856.02 | 6969.18 | 1.65 | 6886.29 | 6834.12 | -0.76 > RotateBenchmark.testRotateRightS | 128.00 | 15.00 | 3569.53 | 3492.76 | -2.15 | 3539.02 | 3470.02 | -1.95 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 13704.18 | 13495.07 | -1.53 | 13649.14 | 13583.87 | -0.48 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 7011.77 | 6953.93 | -0.82 | 6978.28 | 6740.30 | -3.41 > RotateBenchmark.testRotateRightS | 128.00 | 31.00 | 3591.62 | 3620.12 | 0.79 | 3502.04 | 3510.05 | 0.23 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 21950.71 | 22113.60 | 0.74 | 21484.27 | 21596.64 | 0.52 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 11616.88 | 11099.73 | -4.45 | 11188.29 | 10737.68 | -4.03 > RotateBenchmark.testRotateRightS | 256.00 | 7.00 | 5872.72 | 5579.12 | -5.00 | 5784.05 | 5454.57 | -5.70 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 22017.83 | 20817.97 | -5.45 | 21934.65 | 21356.90 | -2.63 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 11414.27 | 11044.86 | -3.24 | 11454.35 | 11140.34 | -2.74 > RotateBenchmark.testRotateRightS | 256.00 | 15.00 | 5786.64 | 5634.05 | -2.64 | 5724.93 | 5639.99 | -1.48 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 21754.77 | 21466.01 | -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 | 5594.33 | 5544.25 | -0.90 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 30247.03 | 30179.41 | -0.22 | 1538.75 | 3975.82 | 158.38 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 15988.73 | 15621.42 | -2.30 | 776.04 | 1910.91 | 146.24 > RotateBenchmark.testRotateRightS | 512.00 | 7.00 | 8115.84 | 8025.28 | -1.12 | 389.12 | 984.46 | 152.99 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 30110.91 | 30200.69 | 0.30 | 1532.49 | 3983.77 | 159.95 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 15957.90 | 15690.73 | -1.67 | 774.90 | 1931.00 | 149.19 > RotateBenchmark.testRotateRightS | 512.00 | 15.00 | 8113.26 | 8037.93 | -0.93 | 391.90 | 965.53 | 146.37 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 29816.97 | 29891.54 | 0.25 | 1538.12 | 3881.93 | 152.38 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 15405.95 | 15619.17 | 1.38 | 762.49 | 1871.00 | 145.38 > RotateBenchmark.testRotateRightS | 512.00 | 31.00 | 7919.80 | 7957.35 | 0.47 | 393.63 | 972.49 | 147.06 Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into JDK-8266054 - 8266054: Removing redundant teat templates. - 8266054: Code reorganization for efficient sharing of logic to check rotate operation support on a target platform. - 8266054: Removing redundant test templates. - 8266054: Review comments resolution. - 8266054: Review comments resolution. - 8266054: Review comments resolution. - Merge http://github.com/openjdk/jdk into JDK-8266054 - 8266054: Changing gen-src.sh file permissions - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4d26f22b...0439e93e ------------- Changes: https://git.openjdk.java.net/jdk/pull/3720/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=06 Stats: 4392 lines in 52 files changed: 4171 ins; 60 del; 161 mod Patch: https://git.openjdk.java.net/jdk/pull/3720.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3720/head:pull/3720 PR: https://git.openjdk.java.net/jdk/pull/3720 From jbhateja at openjdk.java.net Mon May 24 05:54:24 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 24 May 2021 05:54:24 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: On Tue, 18 May 2021 05:21:06 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. Hi @iwanowww, @neliasso, can you please share your feedback on compiler side changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From jiefu at openjdk.java.net Mon May 24 06:18:14 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 24 May 2021 06:18:14 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution In-Reply-To: <2567ba8c-c37c-89ce-54de-ef4844ca488b@oracle.com> References: <2567ba8c-c37c-89ce-54de-ef4844ca488b@oracle.com> Message-ID: On Mon, 24 May 2021 00:39:37 GMT, David Holmes wrote: > > Right I just tracked that through before seeing this email. The fmul > > uses the non-tmp version but later all lir_mul are assumed the same. > > compiler/c1/Test6855215.java passed on x86_32 with the latest version. > > More testing on x86_32 is in progress. > Will let you know once finished. All the tests passed on our Linux/x86_{64, 32}, no regression. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From vlivanov at openjdk.java.net Mon May 24 08:24:14 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 24 May 2021 08:24:14 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v5] In-Reply-To: References: Message-ID: <5JNAmB403uJNxNaTvG3N2ZAoBWRgGFGL08ynYKJ3F9Y=.d7d1e78a-6450-4283-a64a-2faee9ff4255@github.com> On Sun, 23 May 2021 23:14:08 GMT, David Holmes wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into jep-306 > - The code for strict handling only applies to doubles. > - Add missing space > - lir_div_strictfp and lir_mul_strictfp > - Removed divDPR_reg_round as it has a false predicate and so is now unused > - Revert classFileParser changes as they will be handled by JDK-8266530 > - 8266530: HotSpot changes for JEP 306 > All methods are now implicitly strictfp so all code generation etc > uses the strict form. > There are still some names that include "strict" that could potentially > be renamed to rmeove it, but the fact we have to have strict fp semantics > is still important on some platforms, so the names help reinforce that IMO. Marked as reviewed by vlivanov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From neliasso at openjdk.java.net Mon May 24 09:12:13 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 24 May 2021 09:12:13 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: On Tue, 18 May 2021 05:21:06 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. I'll have a look later this week. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From aph at openjdk.java.net Mon May 24 11:12:05 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 24 May 2021 11:12:05 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: On Mon, 17 May 2021 10:50:02 GMT, Dong Bo wrote: > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at mail.openjdk.java.net):_ > > On 5/10/21 6:55 AM, Dong Bo wrote: > > > PING? Any comments/suggestions are appreciated. > > Although this has been reviewed by Ningsheng, we still need help from reviewers here. > > I'm testing this now. I'm back on this, and I can't see how to run the tests. Paul Sandoz says the perf tests in the `panama-vector` are under a maven project, but which maven project is that? You say "When I tested this, the incompatible DecodeBench.java was deleted first since it is all about ShortVector and ByteVector rather than Int64Vector." but I don't know what that means. Please provide step-by-step instructions that allow anyone to reproduce your results. ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From yyang at openjdk.java.net Mon May 24 11:15:18 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 24 May 2021 11:15:18 GMT Subject: RFR: 8267239: C1: RangeCheckElimination for % operator if divisor is IntConstant [v3] In-Reply-To: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> References: <40GCUvFwUxi08a4inydWtopQ9thvWsCOsFz7_-v0QzM=.7255306b-b11d-4fc4-b839-37410bea37a9@github.com> Message-ID: On Wed, 19 May 2021 02:42:03 GMT, Yi Yang wrote: >> % operator follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor(See [LS 15.17.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17.3)). >> >> So if `y` is a constant integer and not equal to 0, then we can deduce the bound of remainder operation: >> - x % -y ==> [0, y - 1] RCE >> - x % y ==> [0, y - 1] RCE >> - -x % y ==> [-y + 1, 0] >> - -x % -y ==> [-y + 1, 0] >> >> Based on above rationale, we can apply RCE for the remainder operations whose dividend is constant integer and >= 0, e.g.: >> >> >> for(int i=0;i<1000;i++){ >> int top5 = arr[i%5]; // Apply RCE if arr is a loop invariant >> .... >> } >> >> >> For more detailed RCE results, please check out the attachment on JBS, it was generated by ArithmeticRemRCE with additional flags -XX:+TraceRangeCheckElimination -XX:+PrintIR. >> >> Testing: >> - test/hotspot/jtreg/compiler/c1/(slowdebug) > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > missing whitespace; more comment I think it is not easy/trivial to implement this for C2 since it idealize ModINode to other nodes if divisor is constant integer: https://github.com/openjdk/jdk/blob/31139108c1ca9d355bd484d692830dfbc8317477/src/hotspot/share/opto/divnode.cpp#L946-L948 https://github.com/openjdk/jdk/blob/31139108c1ca9d355bd484d692830dfbc8317477/src/hotspot/share/opto/divnode.cpp#L166-L192 ------------- PR: https://git.openjdk.java.net/jdk/pull/4083 From vlivanov at openjdk.java.net Mon May 24 11:26:08 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 24 May 2021 11:26:08 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: On Tue, 18 May 2021 05:21:06 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. src/hotspot/cpu/x86/x86.ad line 1577: > 1575: break; > 1576: > 1577: case Op_VectorCmpMasked: Additionally, it requires AVX512BW (for `knotq`), doesn't it? src/hotspot/share/opto/c2_globals.hpp line 85: > 83: range(0, max_jint) \ > 84: \ > 85: product(intx, UsePartialInlineSize, -1, DIAGNOSTIC, \ Also, `UsePartialInlineSize` looks like a bool flag, but it's not. I assume you renamed it because it's not specific to `arraycopy` anymore. What about `ArrayOperationPartialInlineSize`? src/hotspot/share/opto/library_call.cpp line 5218: > 5216: Node* objb_adr = make_unsafe_address(objb, boffset); > 5217: > 5218: assert(scale->bottom_type()->isa_int() && Why `scale` has to be a constant? Should it be a runtime check instead (which fails intrinsification)? src/hotspot/share/opto/library_call.cpp line 5225: > 5223: BasicType prim_types[] = {T_BYTE, T_SHORT, T_INT, T_LONG}; > 5224: BasicType vec_basictype = prim_types[scale_val]; > 5225: const Type* vec_type = Type::get_const_basic_type(vec_basictype); It's not a vector type, but the element type. Also, should `VectorMaskGenNode` just accept element basic type instead? src/hotspot/share/opto/library_call.cpp line 5233: > 5231: > 5232: // Enable partial in-lining if compare size is less than UsePartialInlineSize(default 32 bytes). > 5233: bool enable_pi = (UsePartialInlineSize > 32) ? (NULL != vec_type->isa_int()) Can you elaborate on the check, please? My reading is: - `UsePartialInlineSize = 16: T_BYTE, T_SHORT` - `UsePartialInlineSize = 32: T_BYTE, T_SHORT` - `UsePartialInlineSize = 64: T_BYTE, T_SHORT, T_INT` Why do you handle `T_LONG` case then? What happens for `UsePartialInlineSize = 0`? Should partial inlining be just explicitly disabled in such case? src/hotspot/share/opto/library_call.cpp line 5235: > 5233: bool enable_pi = (UsePartialInlineSize > 32) ? (NULL != vec_type->isa_int()) > 5234: : is_subword_type(vec_basictype); > 5235: if (enable_pi && Type::cmp(TypeInt::ZERO, cmp_res->bottom_type()) && > `Type::cmp(TypeInt::ZERO, cmp_res->bottom_type())` Preferred idiom is to check `stopped()` when constructing the corresponding branch. src/hotspot/share/opto/library_call.cpp line 5270: > 5268: if (!gen_slow_path) { > 5269: set_result(fastcomp_result); > 5270: C->set_max_vector_size(UsePartialInlineSize); It should respect preexisting `C->max_vector_size()` value. One way to shape it is: C->set_max_vector_size(MAX2(C->max_vector_size(), UsePartialInlineSize)); src/hotspot/share/opto/library_call.cpp line 5303: > 5301: clear_upper_avx(); > 5302: } else { > 5303: Node* call = make_runtime_call(RC_LEAF, It duplicates slow path when `enable_pi == true`. Why don't you construct the same shape irrespective of `UsePartialInlineSize` value and let GVN fold it? The fast path check condition (`cmp_res`) can be turned into a constant when partial inlining is disabled. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From dongbo at openjdk.java.net Mon May 24 12:37:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 24 May 2021 12:37:40 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v3] In-Reply-To: References: Message-ID: > On aarch64, current implementations of vector reduce_add2I, reduce_max2I, reduce_min2I can be optimized with NEON pairwise instructions: > > > ## reduce_add2I, before > mov w10, v19.s[0] > mov w2, v19.s[1] > add w10, w0, w10 > add w10, w10, w2 > ## reduce_add2I, optimized > addp v23.2s, v24.2s, v24.2s > mov w10, v23.s[0] > add w10, w10, w2 > > ## reduce_max2I, before > dup v16.2d, v23.d[0] > sminv s16, v16.4s > mov w10, v16.s[0] > cmp w10, w0 > csel w10, w10, w0, lt > ## reduce_max2I, optimized > sminp v16.2s, v23.2s, v23.2s > mov w10, v16.s[0] > cmp w10, w0 > csel w10, w10, w0, lt > > > I don't expect this to change anything of SuperWord, vectorizing reductions of two integers is disabled by [1]. > This is useful for VectorAPI, tested benchmarks in [2], performance can improve ~51% and ~8% for `Int64Vector.ADD` and `Int64Vector.MAX` respectively. > > > Benchmark (size) Mode Cnt Score Error Units > # optimized > Int64Vector.ADDLanes 1024 thrpt 10 2492.123 ? 23.561 ops/ms > Int64Vector.ADDMaskedLanes 1024 thrpt 10 1825.882 ? 5.261 ops/ms > Int64Vector.MAXLanes 1024 thrpt 10 1921.028 ? 3.253 ops/ms > Int64Vector.MAXMaskedLanes 1024 thrpt 10 1588.575 ? 3.903 ops/ms > Int64Vector.MINLanes 1024 thrpt 10 1923.913 ? 2.117 ops/ms > Int64Vector.MINMaskedLanes 1024 thrpt 10 1596.875 ? 2.163 ops/ms > # default > Int64Vector.ADDLanes 1024 thrpt 10 1644.223 ? 1.885 ops/ms > Int64Vector.ADDMaskedLanes 1024 thrpt 10 1491.502 ? 26.436 ops/ms > Int64Vector.MAXLanes 1024 thrpt 10 1784.066 ? 3.816 ops/ms > Int64Vector.MAXMaskedLanes 1024 thrpt 10 1494.750 ? 3.451 ops/ms > Int64Vector.MINLanes 1024 thrpt 10 1785.266 ? 8.893 ops/ms > Int64Vector.MINMaskedLanes 1024 thrpt 10 1499.233 ? 3.498 ops/ms > > > Verified correctness with tests `test/jdk/jdk/incubator/vector/`. Also tested linux-aarch64-server-fastdebug tier1-3. > > [1] https://github.com/openjdk/jdk/blob/3bf4c904fbbd87d4db18db22c1be384616483eed/src/hotspot/share/opto/superword.cpp#L2004 > [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/Int64Vector.java Dong Bo has updated the pull request incrementally with one additional commit since the last revision: trivial fix the format comments in add2I ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3683/files - new: https://git.openjdk.java.net/jdk/pull/3683/files/838ccc9c..9d9ee015 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3683&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3683&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3683.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3683/head:pull/3683 PR: https://git.openjdk.java.net/jdk/pull/3683 From dongbo at openjdk.java.net Mon May 24 12:47:10 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 24 May 2021 12:47:10 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v2] In-Reply-To: References: <1ZfSsFKnXwnkqtxIGeyZyb6L9yFghYh-sTZUWTY3A5U=.1a518678-3fd9-4dcb-be04-20c0060bfe91@github.com> Message-ID: <6JP2aGyKfEzQhRpLIaoGxTL_xD_V_65x6A_RB7fTLU4=.006ddd83-43e4-49e1-90ad-3da07daa3960@github.com> On Mon, 24 May 2021 11:09:44 GMT, Andrew Haley wrote: > > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at mail.openjdk.java.net):_ > > On 5/10/21 6:55 AM, Dong Bo wrote: > > > PING? Any comments/suggestions are appreciated. > > > Although this has been reviewed by Ningsheng, we still need help from reviewers here. > > > > > > I'm testing this now. > > I'm back on this, and I can't see how to run the tests. Paul Sandoz says the perf tests in the `panama-vector` are under a maven project, but which maven project is that? > > You say "When I tested this, the incompatible DecodeBench.java was deleted first since it is all about ShortVector and ByteVector rather than Int64Vector." but I don't know what that means. > > Please provide step-by-step instructions that allow anyone to reproduce your results. Hi, here are the instructions I used to run the benchmarks: 1. Get the benchmark project in `https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark` (I do this via git): ## git clone https://github.com/openjdk/panama-vector.git ## cd panama-vector ## git checkout -b vectorIntrinsics remotes/origin/vectorIntrinsics 2. Delete incompatible `DecodeBench.java` and compile the project with mainline JDK: ## ## cd test/jdk/jdk/incubator/vector/benchmark ## rm src/main/java/benchmark/utf8/DecodeBench.java ## mvn install 3. Run tests: `## /bin/java -jar target/vector-benchmarks.jar benchmark.jdk.incubator.vector.Int64Vector.["M"|"ADD"]+[AXIN]*["Masked"]*Lanes -wi 10 -w 1000ms -f 1 -i 10 -r 1000ms` ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From aph at openjdk.java.net Mon May 24 15:38:23 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 24 May 2021 15:38:23 GMT Subject: RFR: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions [v3] In-Reply-To: References: Message-ID: On Mon, 24 May 2021 12:37:40 GMT, Dong Bo wrote: >> On aarch64, current implementations of vector reduce_add2I, reduce_max2I, reduce_min2I can be optimized with NEON pairwise instructions: >> >> >> ## reduce_add2I, before >> mov w10, v19.s[0] >> mov w2, v19.s[1] >> add w10, w0, w10 >> add w10, w10, w2 >> ## reduce_add2I, optimized >> addp v23.2s, v24.2s, v24.2s >> mov w10, v23.s[0] >> add w10, w10, w2 >> >> ## reduce_max2I, before >> dup v16.2d, v23.d[0] >> sminv s16, v16.4s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> ## reduce_max2I, optimized >> sminp v16.2s, v23.2s, v23.2s >> mov w10, v16.s[0] >> cmp w10, w0 >> csel w10, w10, w0, lt >> >> >> I don't expect this to change anything of SuperWord, vectorizing reductions of two integers is disabled by [1]. >> This is useful for VectorAPI, tested benchmarks in [2], performance can improve ~51% and ~8% for `Int64Vector.ADD` and `Int64Vector.MAX` respectively. >> >> >> Benchmark (size) Mode Cnt Score Error Units >> # optimized >> Int64Vector.ADDLanes 1024 thrpt 10 2492.123 ? 23.561 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1825.882 ? 5.261 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1921.028 ? 3.253 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1588.575 ? 3.903 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1923.913 ? 2.117 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1596.875 ? 2.163 ops/ms >> # default >> Int64Vector.ADDLanes 1024 thrpt 10 1644.223 ? 1.885 ops/ms >> Int64Vector.ADDMaskedLanes 1024 thrpt 10 1491.502 ? 26.436 ops/ms >> Int64Vector.MAXLanes 1024 thrpt 10 1784.066 ? 3.816 ops/ms >> Int64Vector.MAXMaskedLanes 1024 thrpt 10 1494.750 ? 3.451 ops/ms >> Int64Vector.MINLanes 1024 thrpt 10 1785.266 ? 8.893 ops/ms >> Int64Vector.MINMaskedLanes 1024 thrpt 10 1499.233 ? 3.498 ops/ms >> >> >> Verified correctness with tests `test/jdk/jdk/incubator/vector/`. Also tested linux-aarch64-server-fastdebug tier1-3. >> >> [1] https://github.com/openjdk/jdk/blob/3bf4c904fbbd87d4db18db22c1be384616483eed/src/hotspot/share/opto/superword.cpp#L2004 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/Int64Vector.java > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > trivial fix the format comments in add2I Thanks. Changes look good. I managed to reproduce your results. Benchmark (size) Mode Cnt Score Error Units Int64Vector.ADDLanes 1024 thrpt 3 4958.747 ? 54.225 ops/ms Int64Vector.ADDMaskedLanes 1024 thrpt 3 4769.759 ? 12.736 ops/ms Int64Vector.MAXLanes 1024 thrpt 3 2957.985 ? 88.671 ops/ms Int64Vector.MAXMaskedLanes 1024 thrpt 3 2921.381 ? 45.408 ops/ms Int64Vector.MINLanes 1024 thrpt 3 2965.392 ? 25.236 ops/ms Int64Vector.MINMaskedLanes 1024 thrpt 3 2923.870 ? 53.270 ops/ms Benchmark (size) Mode Cnt Score Error Units Int64Vector.ADDLanes 1024 thrpt 3 3560.100 ? 79.753 ops/ms Int64Vector.ADDMaskedLanes 1024 thrpt 3 3585.672 ? 57.203 ops/ms Int64Vector.MAXLanes 1024 thrpt 3 2951.659 ? 9.577 ops/ms Int64Vector.MAXMaskedLanes 1024 thrpt 3 2876.957 ? 37.005 ops/ms Int64Vector.MINLanes 1024 thrpt 3 2953.476 ? 3.446 ops/ms Int64Vector.MINMaskedLanes 1024 thrpt 3 2878.942 ? 50.281 ops/ms ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3683 From kvn at openjdk.java.net Mon May 24 21:57:46 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 24 May 2021 21:57:46 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v11] In-Reply-To: References: Message-ID: <_YmDeW7NebTKQ0ZukAaJ_PsiSQdIkU0XogrKZm6Kqxc=.6bf6edea-99a6-43aa-ab5c-4e4b653e1b62@github.com> On Fri, 21 May 2021 14:23:00 GMT, Denghui Dong wrote: >> 8265129: Add intrinsic support for JVM.getClassId > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Passed tier1-3 and jdk/jfr in my testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From kvn at openjdk.java.net Mon May 24 22:34:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 24 May 2021 22:34:14 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v11] In-Reply-To: References: Message-ID: On Sat, 22 May 2021 09:09:45 GMT, Denghui Dong wrote: >> Good. >> What testing you did? > >> Good. >> What testing you did? > > fastdebug on Linux x86_64 > jtreg hotspot/jtreg/compiler jdk/jdk/jfr/ @D-D-H please update change to latest sources. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Tue May 25 01:42:48 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 25 May 2021 01:42:48 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v11] In-Reply-To: References: Message-ID: <58hgZjiYeL56pHBMbxr3ou7AHV71ZR20qorYRcDpIMU=.74e3ebd7-52bb-4879-99f8-d1fb652408d9@github.com> On Sat, 22 May 2021 09:09:45 GMT, Denghui Dong wrote: >> Good. >> What testing you did? > >> Good. >> What testing you did? > > fastdebug on Linux x86_64 > jtreg hotspot/jtreg/compiler jdk/jdk/jfr/ > @D-D-H please update change to latest sources. @vnkozlov Updated, Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ddong at openjdk.java.net Tue May 25 01:42:47 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 25 May 2021 01:42:47 GMT Subject: RFR: 8265129: Add intrinsic support for JVM.getClassId [v12] In-Reply-To: References: Message-ID: > 8265129: Add intrinsic support for JVM.getClassId Denghui Dong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge remote-tracking branch 'github/master' into get_class_id - update - update - update copyright - fix test - update - fix crash problem - remove c1 part - swap the positions of two operands in cmp operation since the det register will be modified in 32 bit - use new_pointer_register - ... and 6 more: https://git.openjdk.java.net/jdk/compare/7ac79344...82a07681 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3470/files - new: https://git.openjdk.java.net/jdk/pull/3470/files/f99186b9..82a07681 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3470&range=10-11 Stats: 636890 lines in 7523 files changed: 82336 ins; 532965 del; 21589 mod Patch: https://git.openjdk.java.net/jdk/pull/3470.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3470/head:pull/3470 PR: https://git.openjdk.java.net/jdk/pull/3470 From yyang at openjdk.java.net Tue May 25 02:12:03 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 25 May 2021 02:12:03 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v6] In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 19:20:54 GMT, Igor Veresov wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> better check1-4 > > Looks like now the test fails in the pre-submit tests? Hi @veresov, may?I?ask?your?help?to?review?this?patch??Thanks?a?lot. ------------- PR: https://git.openjdk.java.net/jdk/pull/3615 From dongbo at openjdk.java.net Tue May 25 02:20:07 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 25 May 2021 02:20:07 GMT Subject: Integrated: 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions In-Reply-To: References: Message-ID: On Mon, 26 Apr 2021 05:50:20 GMT, Dong Bo wrote: > On aarch64, current implementations of vector reduce_add2I, reduce_max2I, reduce_min2I can be optimized with NEON pairwise instructions: > > > ## reduce_add2I, before > mov w10, v19.s[0] > mov w2, v19.s[1] > add w10, w0, w10 > add w10, w10, w2 > ## reduce_add2I, optimized > addp v23.2s, v24.2s, v24.2s > mov w10, v23.s[0] > add w10, w10, w2 > > ## reduce_max2I, before > dup v16.2d, v23.d[0] > sminv s16, v16.4s > mov w10, v16.s[0] > cmp w10, w0 > csel w10, w10, w0, lt > ## reduce_max2I, optimized > sminp v16.2s, v23.2s, v23.2s > mov w10, v16.s[0] > cmp w10, w0 > csel w10, w10, w0, lt > > > I don't expect this to change anything of SuperWord, vectorizing reductions of two integers is disabled by [1]. > This is useful for VectorAPI, tested benchmarks in [2], performance can improve ~51% and ~8% for `Int64Vector.ADD` and `Int64Vector.MAX` respectively. > > > Benchmark (size) Mode Cnt Score Error Units > # optimized > Int64Vector.ADDLanes 1024 thrpt 10 2492.123 ? 23.561 ops/ms > Int64Vector.ADDMaskedLanes 1024 thrpt 10 1825.882 ? 5.261 ops/ms > Int64Vector.MAXLanes 1024 thrpt 10 1921.028 ? 3.253 ops/ms > Int64Vector.MAXMaskedLanes 1024 thrpt 10 1588.575 ? 3.903 ops/ms > Int64Vector.MINLanes 1024 thrpt 10 1923.913 ? 2.117 ops/ms > Int64Vector.MINMaskedLanes 1024 thrpt 10 1596.875 ? 2.163 ops/ms > # default > Int64Vector.ADDLanes 1024 thrpt 10 1644.223 ? 1.885 ops/ms > Int64Vector.ADDMaskedLanes 1024 thrpt 10 1491.502 ? 26.436 ops/ms > Int64Vector.MAXLanes 1024 thrpt 10 1784.066 ? 3.816 ops/ms > Int64Vector.MAXMaskedLanes 1024 thrpt 10 1494.750 ? 3.451 ops/ms > Int64Vector.MINLanes 1024 thrpt 10 1785.266 ? 8.893 ops/ms > Int64Vector.MINMaskedLanes 1024 thrpt 10 1499.233 ? 3.498 ops/ms > > > Verified correctness with tests `test/jdk/jdk/incubator/vector/`. Also tested linux-aarch64-server-fastdebug tier1-3. > > [1] https://github.com/openjdk/jdk/blob/3bf4c904fbbd87d4db18db22c1be384616483eed/src/hotspot/share/opto/superword.cpp#L2004 > [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/Int64Vector.java This pull request has now been integrated. Changeset: 123cdd1f Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/123cdd1fbd4fa02177c06afb67a09aee21d0a482 Stats: 294 lines in 5 files changed: 23 ins; 12 del; 259 mod 8264973: AArch64: Optimize vector max/min/add reduction of two integers with NEON pairwise instructions Reviewed-by: njian, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/3683 From iveresov at openjdk.java.net Tue May 25 02:28:01 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 25 May 2021 02:28:01 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v9] In-Reply-To: References: Message-ID: On Sat, 8 May 2021 03:00:14 GMT, Yi Yang wrote: >> The JDK codebase re-created many variants of checkIndex(`grep -I -r 'cehckIndex' jdk/`). A notable variant is java.nio.Buffer.checkIndex, which annotated with @IntrinsicCandidate and it only has a corresponding C1 intrinsic version. >> >> In fact, there is an utility method `jdk.internal.util.Preconditions.checkIndex`(wrapped by java.lang.Objects.checkIndex) that behaves the same as these variants of checkIndex, we can replace these re-created variants of checkIndex by Objects.checkIndex, it would significantly reduce duplicated code and enjoys performance improvement because Preconditions.checkIndex is @IntrinsicCandidate and it has a corresponding intrinsic method in HotSpot. >> >> But, the problem is currently HotSpot only implements the C2 version of Preconditions.checkIndex. To reuse it global-widely in JDK code, I think we can firstly implement its C1 counterpart. There are also a few kinds of stuff we can do later: >> >> 1. Replace all variants of checkIndex by Objects.checkIndex in the whole JDK codebase. >> 2. Remove Buffer.checkIndex and obsolete/deprecate InlineNIOCheckIndex flag >> >> Testing: cds, compiler and jdk > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > x86_32 fails Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3615 From yyang at openjdk.java.net Tue May 25 02:48:06 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 25 May 2021 02:48:06 GMT Subject: RFR: 8265518: C1: Intrinsic support for Preconditions.checkIndex [v6] In-Reply-To: References: Message-ID: On Fri, 30 Apr 2021 19:20:54 GMT, Igor Veresov wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> better check1-4 > > Looks like now the test fails in the pre-submit tests? Thank you @veresov! I'm glad to have more comments from hotspot-compiler group. Later, I'd like to integrate it if there are no more comments/objections. Thanks! Yang ------------- PR: https://git.openjdk.java.net/jdk/pull/3615 From yyang at openjdk.java.net Tue May 25 03:25:04 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 25 May 2021 03:25:04 GMT Subject: RFR: 8267376: Deduce the final bound of ModXNode Message-ID: <0qR8Ju3ahlklYs-8x4bBtwHS9uFoNHcLnwILNMML9Ig=.c8c4ade9-59b8-411d-80e9-0d55d2726551@github.com> if the divisor is a constant and not equal to 0, it's possible to deduce the final bound of ModXNode given that the following rules: x % -y ==> [0, y - 1] x % y ==> [0, y - 1] -x % y ==> [-y + 1, 0] -x % -y ==> [-y + 1, 0] FYI: The original purpose of this patch is to eliminate array access range check(e.g. `arr[val%5]`) which discussed in https://github.com/openjdk/jdk/pull/4083#issuecomment-846971247, because ModXNode would be transformed to other nodes during IGVN, RangeCheckNode is still generated when accessing array element. Regardless of eliminating array access range check, it still reasonable to deduce the bound of % operation if the divisor is known constant. ------------- Commit messages: - trailing whitespaces - ModXNode::Value Changes: https://git.openjdk.java.net/jdk/pull/4179/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4179&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267376 Stats: 88 lines in 1 file changed: 50 ins; 30 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/4179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4179/head:pull/4179 PR: https://git.openjdk.java.net/jdk/pull/4179 From ddong at openjdk.java.net Tue May 25 04:14:08 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 25 May 2021 04:14:08 GMT Subject: Integrated: 8265129: Add intrinsic support for JVM.getClassId In-Reply-To: References: Message-ID: On Tue, 13 Apr 2021 16:52:44 GMT, Denghui Dong wrote: > 8265129: Add intrinsic support for JVM.getClassId This pull request has now been integrated. Changeset: 2e8812df Author: Denghui Dong Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/2e8812df142430d1a6b0a4df0259d2656a1548c9 Stats: 212 lines in 20 files changed: 116 ins; 56 del; 40 mod 8265129: Add intrinsic support for JVM.getClassId Reviewed-by: kvn, mgronlun ------------- PR: https://git.openjdk.java.net/jdk/pull/3470 From ngasson at openjdk.java.net Tue May 25 08:20:16 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 25 May 2021 08:20:16 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v5] In-Reply-To: References: Message-ID: <9iau55DjSdv3awUd0ZdB8lIwjdznhnCl-q5JZrcplG8=.ebdff3ef-fc1e-4204-8418-d2d320ab4cd1@github.com> On Sun, 23 May 2021 23:14:08 GMT, David Holmes wrote: >> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >> >> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. >> >> Testing: tiers 1-3 >> >> Thanks, >> David > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into jep-306 > - The code for strict handling only applies to doubles. > - Add missing space > - lir_div_strictfp and lir_mul_strictfp > - Removed divDPR_reg_round as it has a false predicate and so is now unused > - Revert classFileParser changes as they will be handled by JDK-8266530 > - 8266530: HotSpot changes for JEP 306 > All methods are now implicitly strictfp so all code generation etc > uses the strict form. > There are still some names that include "strict" that could potentially > be renamed to rmeove it, but the fact we have to have strict fp semantics > is still important on some platforms, so the names help reinforce that IMO. src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 427: > 425: LIR_Opr tmp = LIR_OprFact::illegalOpr; > 426: if (x->op() == Bytecodes::_dmul || x->op() == Bytecodes::_ddiv) { > 427: tmp = new_register(T_DOUBLE); This variable `tmp` doesn't seem to be used. I guess this code originally came from x86 where it's passed through to `arithmetic_op_fpu()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3991 From david.holmes at oracle.com Tue May 25 09:35:29 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 May 2021 19:35:29 +1000 Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v5] In-Reply-To: <9iau55DjSdv3awUd0ZdB8lIwjdznhnCl-q5JZrcplG8=.ebdff3ef-fc1e-4204-8418-d2d320ab4cd1@github.com> References: <9iau55DjSdv3awUd0ZdB8lIwjdznhnCl-q5JZrcplG8=.ebdff3ef-fc1e-4204-8418-d2d320ab4cd1@github.com> Message-ID: <9fd04f3c-b667-69ea-80ac-bd5f885dad4e@oracle.com> On 25/05/2021 6:20 pm, Nick Gasson wrote: > On Sun, 23 May 2021 23:14:08 GMT, David Holmes wrote: > >>> As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. >>> >>> There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 427: > >> 425: LIR_Opr tmp = LIR_OprFact::illegalOpr; >> 426: if (x->op() == Bytecodes::_dmul || x->op() == Bytecodes::_ddiv) { >> 427: tmp = new_register(T_DOUBLE); > > This variable `tmp` doesn't seem to be used. I guess this code originally came from x86 where it's passed through to `arithmetic_op_fpu()`. Well spotted Nick. I've removed this dead code and am re-testing. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/3991 > From dholmes at openjdk.java.net Tue May 25 09:37:28 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 25 May 2021 09:37:28 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v6] In-Reply-To: References: Message-ID: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David David Holmes has updated the pull request incrementally with one additional commit since the last revision: Remove dead code on aarch64 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3991/files - new: https://git.openjdk.java.net/jdk/pull/3991/files/4dcab9b7..cc526aa4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3991.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3991/head:pull/3991 PR: https://git.openjdk.java.net/jdk/pull/3991 From sviswanathan at openjdk.java.net Tue May 25 20:19:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 25 May 2021 20:19:04 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v14] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Javadoc changes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/4d59af0a..6cd50248 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=13 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=12-13 Stats: 58 lines in 1 file changed: 38 ins; 0 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From sviswanathan at openjdk.java.net Tue May 25 22:02:39 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 25 May 2021 22:02:39 GMT Subject: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v15] In-Reply-To: References: Message-ID: > This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. > These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. > > The following changes are made: > The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. > The assembly source files are named as ?*.S? and include files are named as ?*.S.inc?. > The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. > Changes are made to build system to support dependency tracking for assembly files with includes. > The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. > The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. > > Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: correct javadoc ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/6cd50248..e5208a18 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=13-14 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638 From yyang at openjdk.java.net Wed May 26 01:28:33 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 26 May 2021 01:28:33 GMT Subject: RFR: 8267687: ModXNode::Ideal optimization is better than Parse::do_irem Message-ID: <7VdGuyOXAOwlvDy2Tq5IvZzurmn8F7kL08ifoo5wBtI=.66821483-a0e1-45b1-bccf-121834bba26b@github.com> Hi all, Can I have a review of this change? I noticed there are two almost the same optimizations for % operation. For x%y, both Parse::do_irem and ModXNode::ideal are optimized for a special case that divisor y is `2^n` constant value. ModXNode::Ideal opt: https://github.com/openjdk/jdk/blob/cc687fd43ade6be8760c559f3ffa909c5937727c/src/hotspot/share/opto/divnode.cpp#L112-L160 Parse::do_irem opt: https://github.com/openjdk/jdk/blob/cc687fd43ade6be8760c559f3ffa909c5937727c/src/hotspot/share/opto/parse2.cpp#L1171-L1196 It turns out that ModXNode::Ideal optimization is better than Parse::do_irem in a simple microbenchmark(Please check out JBS attachment for detailed benchmark result): ModXNode::Ideal opt: ---------------- Benchmark Mode Cnt Score Error Units ModPowerOf2.testNegativePowerOf2 avgt 25 8746.608 ? 139.777 ns/op ModPowerOf2.testPositivePowerOf2 avgt 25 8735.545 ? 86.145 ns/op Parse::do_irem opt: ---------------- Benchmark Mode Cnt Score Error Units ModPowerOf2.testNegativePowerOf2 avgt 25 8693.797 ? 7.844 ns/op ModPowerOf2.testPositivePowerOf2 avgt 25 6618.652 ? 1.739 ns/op Diff for ideal graph: ---------------- ![ideal_graph](https://user-images.githubusercontent.com/5010047/119525589-34585f80-bdb1-11eb-9d7e-e3962cd7f789.jpg) Thanks! Yang ------------- Commit messages: - tailing whitespace - use ModXNode::Ideal; remove Parse::do_irem opt Changes: https://git.openjdk.java.net/jdk/pull/4188/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4188&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267687 Stats: 160 lines in 3 files changed: 101 ins; 58 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4188.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4188/head:pull/4188 PR: https://git.openjdk.java.net/jdk/pull/4188 From dlong at openjdk.java.net Wed May 26 05:42:20 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 26 May 2021 05:42:20 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v3] In-Reply-To: <8bbfrQRWAy3o4XRgMjcHZ6Cp7x-3vWmdq1M3x3T-mvE=.6a933a21-2fd4-4677-b162-a39e7b64084d@github.com> References: <8bbfrQRWAy3o4XRgMjcHZ6Cp7x-3vWmdq1M3x3T-mvE=.6a933a21-2fd4-4677-b162-a39e7b64084d@github.com> Message-ID: On Thu, 20 May 2021 02:44:10 GMT, Yi Yang wrote: >> After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. >> >> There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() >> >> https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 >> >> We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. >> >> (This patch actually does two things: >> 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw >> 2. `Cleanup unused Unsafe{Get,Put}Raw code` >> They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) >> >> Thanks! >> Yang > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > many nit It looks like when JDK-8150921 was done, some micro-benchmarks were run, and @shipilev noticed a regression in C1 with constant null handling: http://cr.openjdk.java.net/~shade/8150921/notes.txt Looking over the code review: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-May/022767.html I don't see any references to benchmark testing that would check for any regressions in the InlineUnsafeOps + OptimizeUnsafes optimization that 8150921 mostly bypassed. @vidmik Now this PR proposes to remove this mostly dead code. I'm OK with that, since blindly reviving it risks hitting bitrot, but for compleness, perhaps we should file 2 followup RFEs: 1) investigate constant null C1 regression 2) measure if the dead InlineUnsafeOps + OptimizeUnsafes optimization is worth reviving in a later release (with sufficient bake time) ------------- Changes requested by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3917 From jcm at openjdk.java.net Wed May 26 06:48:16 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 06:48:16 GMT Subject: Withdrawn: 8265132: C2 compilation fails with assert "missing precedence edge" In-Reply-To: References: Message-ID: On Tue, 25 May 2021 16:04:05 GMT, Jamsheed Mohammed C M wrote: > Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 > but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 > > Request for review This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4189 From jcm at openjdk.java.net Wed May 26 06:57:24 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 06:57:24 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" Message-ID: Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 Request for review ------------- Commit messages: - fix for C2 compilation fails with assert missing precedence edge - Merge branch 'openjdk:master' into master - Merge pull request #5 from openjdk/master - Merge branch 'master' of https://github.com/jamsheedcm/jdk - Merge pull request #4 from openjdk/master - Merge branch 'master' of https://github.com/jamsheedcm/jdk - Merge pull request #3 from openjdk/master - Merge branch 'master' of https://github.com/jamsheedcm/jdk - Merge pull request #2 from openjdk/master - Merge branch 'master' of https://github.com/jamsheedcm/jdk - ... and 1 more: https://git.openjdk.java.net/jdk/compare/9eaa4afc...5a857f86 Changes: https://git.openjdk.java.net/jdk/pull/4200/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4200&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265132 Stats: 59 lines in 3 files changed: 38 ins; 16 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/4200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4200/head:pull/4200 PR: https://git.openjdk.java.net/jdk/pull/4200 From roland at openjdk.java.net Wed May 26 09:25:24 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 26 May 2021 09:25:24 GMT Subject: RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v3] In-Reply-To: References: Message-ID: On Fri, 21 May 2021 23:28:05 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Tobias' review >> - Merge branch 'master' into JDK-8252372 >> - CastVV >> - Merge branch 'master' into JDK-8252372 >> - extra comments >> - fix > > Marked as reviewed by kvn (Reviewer). @vnkozlov thanks for the review ------------- PR: https://git.openjdk.java.net/jdk/pull/3689 From roland at openjdk.java.net Wed May 26 09:25:24 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 26 May 2021 09:25:24 GMT Subject: Integrated: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() In-Reply-To: References: Message-ID: On Mon, 26 Apr 2021 09:46:18 GMT, Roland Westrelin wrote: > Sinking data nodes out of a loop when all uses are out of a loop has > several issues that this attempts to fix. > > 1- Only non control uses are considered which makes little sense (why > not sink if the data node is an argument to a call or a returned > value?) > > 2- Sinking of Loads is broken because of the handling of > anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control > in the loop because it takes all uses into account. > > 3- For data nodes for which a control edge can't be set, commoning of > clones back in the loop is prevented with: > _igvn._worklist.yank(x); > which gives no guarantee > > This patch tries to address all issues: > > 1- it looks at all uses, not only non control uses > > 2- anti-dependences are computed for each use independently > > 3- Cast nodes are used to pin clones out of loop > > > 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl() > logic. While working on this, I noticed a bug in anti-dependence > analysis: when the use is a cfg node, the code sometimes looks at uses > of the memory state of the cfg. The logic uses the use of the cfg > which is a projection of adr_type identical to the cfg. It should > instead look at the use of the memory projection. > > The existing logic for sinking loads calls clear_dom_lca_tags() for > every load which seems like quite a waste. I added a > _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By > incrementing _dom_lca_tags_round, new tags that don't conflict with > existing ones are produced and there's no need for > clear_dom_lca_tags(). > > For anti-dependence analysis to return a correct result, early control > of the load is needed. The only way to get it at this stage, AFAICT, > is to compute it by following the load's input until a pinned node is > reached. > > The existing logic pins cloned nodes next to their use. The logic I > propose pins them right out of the loop. This could possibly avoid > some redundant clones. It also makes some special handling for corner > cases with loop strip mining useless. > > For 3-, I added extra Cast nodes for float types. If a chain of data > nodes are sunk, the new logic tries to keep a single Cast for the > entire chain rather than one Cast per node. This pull request has now been integrated. Changeset: 9d305b9c Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/9d305b9c0625d73c752724569dbb7f6c8e80931c Stats: 612 lines in 14 files changed: 402 ins; 76 del; 134 mod 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/3689 From mli at openjdk.java.net Wed May 26 09:39:17 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 26 May 2021 09:39:17 GMT Subject: RFR: 8267130: Memory Overflow in Disassembler::load_library In-Reply-To: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> References: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> Message-ID: On Fri, 14 May 2021 02:17:29 GMT, Wang Huang wrote: > * reproduce: > put your libjvm.so in a long enough path, such like looks good. ------------- Marked as reviewed by mli (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4020 From whuang at openjdk.java.net Wed May 26 10:25:18 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Wed, 26 May 2021 10:25:18 GMT Subject: Integrated: 8267130: Memory Overflow in Disassembler::load_library In-Reply-To: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> References: <5d59O0bG7Vu16Ub4lzq2ISDo8MJLXfv35SZ8iLzHzs4=.bc2c4cda-54ee-4610-b849-71cf42a94003@github.com> Message-ID: <5kkjD3YNQFwLzhusBiWEUkgZp_PihD7eJCacSaw7cZY=.37b78bd5-6844-4be4-8254-e83d15cab551@github.com> On Fri, 14 May 2021 02:17:29 GMT, Wang Huang wrote: > * reproduce: > put your libjvm.so in a long enough path, such like This pull request has now been integrated. Changeset: 083416d3 Author: Wang Huang Committer: Hamlin Li URL: https://git.openjdk.java.net/jdk/commit/083416d36c0d7fd17dd0db546129411450dfcccf Stats: 29 lines in 1 file changed: 17 ins; 5 del; 7 mod 8267130: Memory Overflow in Disassembler::load_library Co-authored-by: Wang Huang Co-authored-by: Miao Zhuojun Reviewed-by: neliasso, mli ------------- PR: https://git.openjdk.java.net/jdk/pull/4020 From yyang at openjdk.java.net Wed May 26 10:57:15 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 26 May 2021 10:57:15 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v3] In-Reply-To: References: <8bbfrQRWAy3o4XRgMjcHZ6Cp7x-3vWmdq1M3x3T-mvE=.6a933a21-2fd4-4677-b162-a39e7b64084d@github.com> Message-ID: On Wed, 26 May 2021 05:39:26 GMT, Dean Long wrote: > It looks like when JDK-8150921 was done, some micro-benchmarks were run, and @shipilev noticed a regression in C1 with constant null handling: > http://cr.openjdk.java.net/~shade/8150921/notes.txt > Looking over the code review: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-May/022767.html > I don't see any references to benchmark testing that would check for any regressions in the InlineUnsafeOps + OptimizeUnsafes optimization that 8150921 mostly bypassed. @vidmik > Now this PR proposes to remove this mostly dead code. I'm OK with that, since blindly reviving it risks hitting bitrot, but for compleness, perhaps we should file 2 followup RFEs: > > 1. investigate constant null C1 regression > 2. measure if the dead InlineUnsafeOps + OptimizeUnsafes optimization is worth reviving in a later release (with sufficient bake time) They are indeed necessary. Thanks for sharing more information about JDK-8150921. I've filed JDK-8267783 and JDK-8267782 and assign them to me, I would investigate and check if it's worth doing later. ------------- PR: https://git.openjdk.java.net/jdk/pull/3917 From yongzhou at openjdk.java.net Wed May 26 13:08:24 2021 From: yongzhou at openjdk.java.net (Yong Zhou) Date: Wed, 26 May 2021 13:08:24 GMT Subject: RFR: 8267686: C2: PrintIdealGraphFile supports parameterization Message-ID: When analyzing C2 problems in jcstress[1], which starts multiple JVMs. If parsing %p%t is not supported, the file specified by PrintIdealGraphFile will be opened repeatedly. Example java -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=ideal-%p-%t.xml -XX:CompileCommand="print *jcstress::lambda$sanityCheck_Footprints$2" -jar jcstress.jar -c 64 -f 1 -iters 1 -t org.openjdk.jcstress.tests.locks.stamped.StampedLockPairwiseTests Implemented by referring to `DumpLoadedClassList` [1] https://mail.openjdk.java.net/pipermail/jdk8u-dev/2021-January/013278.html ------------- Commit messages: - 8267686: C2: PrintIdealGraphFile supports parameterization Changes: https://git.openjdk.java.net/jdk/pull/4201/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4201&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267686 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4201.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4201/head:pull/4201 PR: https://git.openjdk.java.net/jdk/pull/4201 From adfarley at redhat.com Wed May 26 13:59:13 2021 From: adfarley at redhat.com (Adam Farley) Date: Wed, 26 May 2021 14:59:13 +0100 Subject: RFR: JDK-8267773: StringIndexOutOfBoundsException(Integer.MIN_VALUE) loop returns wrong value within 20k Message-ID: Hi All, Could someone with JIT knowledge please take a look at this? TLDR: Call StringIndexOutOfBoundsException(Integer.MIN_VALUE) enough times and eventually the toString return value will start ending in -2 instead of -2147483648. A minimal test case is attached to the bug: https://bugs.openjdk.java.net/browse/JDK-8267773 Best Regards Adam Farley Software Developer Red Hat From ddong at openjdk.java.net Wed May 26 14:35:28 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 26 May 2021 14:35:28 GMT Subject: RFR: 8267800: Remove the '_dirty' set in BCEscapeAnalyzer Message-ID: Hi, Could I have a review of this change? The content of `_dirty` in `BCEscapeAnalyzer` is only updated when processing `_aaload`. And it will not affect the results of the analysis because its content is never used, so I think we should remove this set. Thanks, Denghui ------------- Commit messages: - 8267800: Remove the '_dirty' set in BCEscapeAnalyzer Changes: https://git.openjdk.java.net/jdk/pull/4208/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4208&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267800 Stats: 9 lines in 2 files changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4208.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4208/head:pull/4208 PR: https://git.openjdk.java.net/jdk/pull/4208 From jbhateja at openjdk.java.net Wed May 26 15:01:42 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 26 May 2021 15:01:42 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v5] In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > > > > > > > > > > > > > > BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain > -- | -- | -- | -- | -- | -- | -- > ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 > ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 > ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 > ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 > ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 > ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 > ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 > ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 > ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 > ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 > ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 > ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 > ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 > ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 > ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 > ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 > ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 > ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 > ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 > ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 > ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 > ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 > ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 > ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 > ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 > ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 > ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 > ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 > ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 > ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 > ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 > ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 > ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 > ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 > ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 > ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 > ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 > ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 > ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 > ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 > ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 > ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 > ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 > ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 > ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 > ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 > ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 > ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 > ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 > ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 > ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 > ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 > ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 > ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 > ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 > ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 > ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 > ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 > ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 > ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 > ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 > ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 > ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 > ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 > ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 > ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 > ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 > ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 > ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 > ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266951: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3999/files - new: https://git.openjdk.java.net/jdk/pull/3999/files/946e997a..4d7964b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=03-04 Stats: 102 lines in 7 files changed: 11 ins; 14 del; 77 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Wed May 26 15:07:15 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 26 May 2021 15:07:15 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: On Mon, 24 May 2021 09:26:27 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. > > src/hotspot/share/opto/library_call.cpp line 5218: > >> 5216: Node* objb_adr = make_unsafe_address(objb, boffset); >> 5217: >> 5218: assert(scale->bottom_type()->isa_int() && > > Why `scale` has to be a constant? Should it be a runtime check instead (which fails intrinsification)? All the existing calls to vectorizedMismatch routines from Arrays.mismatch and BufferMismatch.mismatch passes a static final scale argument. Element type inferencing based on scale is deterministic. Added a runtime check as suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Wed May 26 15:12:19 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 26 May 2021 15:12:19 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: On Mon, 24 May 2021 11:07:16 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining. > > src/hotspot/share/opto/library_call.cpp line 5225: > >> 5223: BasicType prim_types[] = {T_BYTE, T_SHORT, T_INT, T_LONG}; >> 5224: BasicType vec_basictype = prim_types[scale_val]; >> 5225: const Type* vec_type = Type::get_const_basic_type(vec_basictype); > > It's not a vector type, but the element type. > > Also, should `VectorMaskGenNode` just accept element basic type instead? VectorMaskGenNode is of LONG type, explicit lane element type associated with VectorMaskGenNode is used to convert Store/LoadVectorMasked to Store/LoadVector for all true masks. Existing node definition expects a Type instead of a BasicType. I can change the name to vec_lane_type if that's acceptable. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Wed May 26 17:41:22 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 17:41:22 GMT Subject: RFR: 8267805: Add UseVtableBasedCHA to the list of JVM flags known to jtreg Message-ID: A trivial change to enable `vm.opt.final.UseVtableBasedCHA` usage in `@requires`. It was omitted in #3727 and `compiler/cha/StrengthReduceInterfaceCall` hasn't been executed since then. Testing: - [x] hs-tier1 - hs-tier4 ------------- Commit messages: - Make UseVtableBasedCHA known to jtreg Changes: https://git.openjdk.java.net/jdk/pull/4210/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4210&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267805 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4210.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4210/head:pull/4210 PR: https://git.openjdk.java.net/jdk/pull/4210 From shade at openjdk.java.net Wed May 26 17:47:15 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 26 May 2021 17:47:15 GMT Subject: RFR: 8267805: Add UseVtableBasedCHA to the list of JVM flags known to jtreg In-Reply-To: References: Message-ID: On Wed, 26 May 2021 17:24:55 GMT, Vladimir Ivanov wrote: > A trivial change to enable `vm.opt.final.UseVtableBasedCHA` usage in `@requires`. > > It was omitted in #3727 and `compiler/cha/StrengthReduceInterfaceCall` hasn't been executed since then. > > Testing: > - [x] hs-tier1 - hs-tier4 Fine and trivial. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4210 From jcm at openjdk.java.net Wed May 26 18:01:25 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 18:01:25 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" In-Reply-To: References: Message-ID: <_aj8j9xILyrCY-cITqvjdsi1su-sz4V6aphD9hZHVlw=.a47c3e4a-838c-461d-a276-b33d574dd7b3@github.com> On Wed, 26 May 2021 06:50:54 GMT, Jamsheed Mohammed C M wrote: > Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 > but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 > > Request for review withdrawing the pull request as simply silencing assert wont do as a fix. It requires other part of code to be changed as `raise_LCA_mark` `raise_LCA_visited` algorithm maintain states between calls. ------------- PR: https://git.openjdk.java.net/jdk/pull/4200 From jcm at openjdk.java.net Wed May 26 18:01:26 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 18:01:26 GMT Subject: Withdrawn: 8265132 : C2 compilation fails with assert "missing precedence edge" In-Reply-To: References: Message-ID: On Wed, 26 May 2021 06:50:54 GMT, Jamsheed Mohammed C M wrote: > Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 > but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 > > Request for review This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4200 From vlivanov at openjdk.java.net Wed May 26 18:04:25 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 18:04:25 GMT Subject: RFR: 8267806: C1: Relax inlining checks for not yet initialized classes Message-ID: The checks which guide inlining decisions in C1 are too strong: declaring holder class is required to be fully initialized while JVMS only mandates an initialization barrier on resolved class in `invokestatic` case. The fix relaxes the checks to rule out only not yet linked classes unless it is an `invokestatic` call site. Testing: - [x] hs-tier1 - hs-tier9 ------------- Commit messages: - Relax inlining checks in C1 Changes: https://git.openjdk.java.net/jdk/pull/4211/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4211&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267806 Stats: 13 lines in 1 file changed: 6 ins; 3 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/4211.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4211/head:pull/4211 PR: https://git.openjdk.java.net/jdk/pull/4211 From vlivanov at openjdk.java.net Wed May 26 18:10:33 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 18:10:33 GMT Subject: RFR: 8267807: C2: Downcast receiver to target holder during inlining Message-ID: Virtual method calls involve an implicit subtype check against callee holder. But if receiver type is too broad, it has to be narrowed before parsing the callee method. Otherwise, it may cause problems during parsing and currently it simply blocks inlining. Proposed fix implements the narrowing step and re-enables inlining. Testing: - [x] hs-tier1 - hs-tier9 ------------- Commit messages: - Validate receiver type against target method Changes: https://git.openjdk.java.net/jdk/pull/4212/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4212&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267807 Stats: 86 lines in 2 files changed: 43 ins; 18 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/4212.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4212/head:pull/4212 PR: https://git.openjdk.java.net/jdk/pull/4212 From jcm at openjdk.java.net Wed May 26 18:35:41 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 18:35:41 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" [v2] In-Reply-To: References: Message-ID: > Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 > but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 > > Request for review Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: fixing issue related to raising LCA ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4200/files - new: https://git.openjdk.java.net/jdk/pull/4200/files/5a857f86..7e254c9b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4200&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4200&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4200/head:pull/4200 PR: https://git.openjdk.java.net/jdk/pull/4200 From jcm at openjdk.java.net Wed May 26 18:46:59 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 18:46:59 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" [v3] In-Reply-To: References: Message-ID: > Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 > but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 > > Request for review Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: fixing issue related to raising LCA ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4200/files - new: https://git.openjdk.java.net/jdk/pull/4200/files/7e254c9b..1a390c7d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4200&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4200&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4200/head:pull/4200 PR: https://git.openjdk.java.net/jdk/pull/4200 From jcm at openjdk.java.net Wed May 26 18:49:45 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 18:49:45 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" [v2] In-Reply-To: References: Message-ID: <8yP0Cpowce18IIjje0rYc6zHmthenyir7DLxH0_vwdU=.453d9f1d-24bb-440c-857a-998e03054dc8@github.com> On Wed, 26 May 2021 18:35:41 GMT, Jamsheed Mohammed C M wrote: >> Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 >> but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 >> >> Request for review > > Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: > > fixing issue related to raising LCA 7e254c9 1a390c7 I have removed the ASSERT toggle and made the check in product mode to skip unrelated load-stores anti-dependencies insertion and related code. ------------- PR: https://git.openjdk.java.net/jdk/pull/4200 From dlong at openjdk.java.net Wed May 26 18:51:32 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 26 May 2021 18:51:32 GMT Subject: RFR: 8266746: C1: Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block [v3] In-Reply-To: <8bbfrQRWAy3o4XRgMjcHZ6Cp7x-3vWmdq1M3x3T-mvE=.6a933a21-2fd4-4677-b162-a39e7b64084d@github.com> References: <8bbfrQRWAy3o4XRgMjcHZ6Cp7x-3vWmdq1M3x3T-mvE=.6a933a21-2fd4-4677-b162-a39e7b64084d@github.com> Message-ID: On Thu, 20 May 2021 02:44:10 GMT, Yi Yang wrote: >> After JDK-8150921, most Unsafe{Get,Put}Raw intrinsic methods can be replaced by Unsafe{Get,Put}Object. >> >> There is the only one occurrence where c1 refers UnsafeGetRaw among GraphBuilder::setup_osr_entry_block() >> >> https://github.com/openjdk/jdk/blob/74fecc070a6462e6a2d061525b53a63de15339f9/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3143-L3157 >> >> We can replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block. After that, Unsafe{Get,Put}Raw can be completely removed because no one refers to them. >> >> (This patch actually does two things: >> 1. `Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block` This is the only occurrence where c1 refers UnsafeGetRaw >> 2. `Cleanup unused Unsafe{Get,Put}Raw code` >> They are related so I put it together, but I still want to hear your suggestions, I will separate them into two patches if you think it is more reasonable) >> >> Thanks! >> Yang > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > many nit Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3917 From dlong at openjdk.java.net Wed May 26 20:20:04 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 26 May 2021 20:20:04 GMT Subject: RFR: 8267800: Remove the '_dirty' set in BCEscapeAnalyzer In-Reply-To: References: Message-ID: On Wed, 26 May 2021 14:29:35 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this change? > > The content of `_dirty` in `BCEscapeAnalyzer` is only updated when processing `_aaload`. And it will not affect the results of the analysis because its content is never used. > IIUC, I think we should remove this set. > > Thanks, > Denghui I did a little source code archeology, and _dirty was used when bcEscapeAnalyzer.cpp was added by https://bugs.openjdk.java.net/browse/JDK-6339956. But the changes made by https://bugs.openjdk.java.net/browse/JDK-6488063 caused _dirty to be unused. ------------- PR: https://git.openjdk.java.net/jdk/pull/4208 From jcm at openjdk.java.net Wed May 26 21:00:06 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 26 May 2021 21:00:06 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" [v3] In-Reply-To: References: Message-ID: On Wed, 26 May 2021 18:46:59 GMT, Jamsheed Mohammed C M wrote: >> Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 >> but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 >> >> Request for review > > Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: > > fixing issue related to raising LCA On further thought resetting `raise_LCA_mark` `raise_LCA_visited` at end of `insert_anti_dependence` seemed better than having costly checks in product mode. will give it a try. ------------- PR: https://git.openjdk.java.net/jdk/pull/4200 From vlivanov at openjdk.java.net Wed May 26 21:11:23 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 21:11:23 GMT Subject: RFR: 8065760: CHA: Improve abstract method support Message-ID: Enable CHA to look for unique concrete methods under abstract root methods. Only vtable-based implementation is affected. Old implementation is left as is. The unit test requires #4211 and #4212 to pass. Testing: - [x] hs-tier1 - hs-tier9 ------------- Commit messages: - Test case - Enable abstract root method support Changes: https://git.openjdk.java.net/jdk/pull/4213/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4213&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8065760 Stats: 901 lines in 6 files changed: 543 ins; 307 del; 51 mod Patch: https://git.openjdk.java.net/jdk/pull/4213.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4213/head:pull/4213 PR: https://git.openjdk.java.net/jdk/pull/4213 From kvn at openjdk.java.net Wed May 26 21:53:04 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 26 May 2021 21:53:04 GMT Subject: RFR: 8267800: Remove the '_dirty' set in BCEscapeAnalyzer In-Reply-To: References: Message-ID: <69LZNqqSneaavajctrOCtARw9CGbhQ4XP41VtwMuebk=.6bf1c112-c31d-45e7-815f-f66708812304@github.com> On Wed, 26 May 2021 14:29:35 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this change? > > The content of `_dirty` in `BCEscapeAnalyzer` is only updated when processing `_aaload`. And it will not affect the results of the analysis because its content is never used. > IIUC, I think we should remove this set. > > Thanks, > Denghui It was used to set `_return_local`. But it was changed long ago. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4208 From vlivanov at openjdk.java.net Wed May 26 22:57:09 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 22:57:09 GMT Subject: RFR: 8267805: Add UseVtableBasedCHA to the list of JVM flags known to jtreg In-Reply-To: References: Message-ID: On Wed, 26 May 2021 17:24:55 GMT, Vladimir Ivanov wrote: > A trivial change to enable `vm.opt.final.UseVtableBasedCHA` usage in `@requires`. > > It was omitted in #3727 and `compiler/cha/StrengthReduceInterfaceCall` hasn't been executed since then. > > Testing: > - [x] hs-tier1 - hs-tier4 Thanks for the review, Aleksey. ------------- PR: https://git.openjdk.java.net/jdk/pull/4210 From vlivanov at openjdk.java.net Wed May 26 22:57:10 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 22:57:10 GMT Subject: Integrated: 8267805: Add UseVtableBasedCHA to the list of JVM flags known to jtreg In-Reply-To: References: Message-ID: <6AosXG7fWAK4cgi_Ow5PhhoDfloFU_89RQP9HN4_gxY=.87459569-e0dc-4ba6-8960-fdf2cb412fe8@github.com> On Wed, 26 May 2021 17:24:55 GMT, Vladimir Ivanov wrote: > A trivial change to enable `vm.opt.final.UseVtableBasedCHA` usage in `@requires`. > > It was omitted in #3727 and `compiler/cha/StrengthReduceInterfaceCall` hasn't been executed since then. > > Testing: > - [x] hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 1899f022 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/1899f022b1cb66ecc0615ff5939b5492e2805a1c Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8267805: Add UseVtableBasedCHA to the list of JVM flags known to jtreg Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/4210 From vlivanov at openjdk.java.net Wed May 26 22:57:23 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 22:57:23 GMT Subject: RFR: 8036580: CHA: improve default method support Message-ID: <8NZ96i5NlsxPJAemU9K8gwU2QniFxUCyZZzHjtsUZ3g=.2c6b05a7-b9ef-478c-81af-7a9388991530@github.com> Enable CHA to look for unique concrete methods under default interface methods. Only vtable-based implementation is affected. Old implementation is left as is. The unit test requires #4211/#4212 to pass and relies on test code refactorings from #4213. Testing: - [x] hs-tier1 - hs-tier9 ------------- Depends on: https://git.openjdk.java.net/jdk/pull/4213 Commit messages: - Test case - Enable default root method support Changes: https://git.openjdk.java.net/jdk/pull/4214/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4214&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8036580 Stats: 238 lines in 6 files changed: 216 ins; 9 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/4214.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4214/head:pull/4214 PR: https://git.openjdk.java.net/jdk/pull/4214 From vlivanov at openjdk.java.net Wed May 26 23:02:09 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 26 May 2021 23:02:09 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <6phAafS9kz8v8Nwo4XGynTzt8K1KaNv7jhcnMZ59mew=.de2cbb11-261a-4974-bf21-adf47f6e8482@github.com> Message-ID: On Wed, 26 May 2021 15:08:54 GMT, Jatin Bhateja wrote: > Existing node definition expects a Type instead of a BasicType. Yes, what I suggested is to reconsider that. Unless I'm missing something important, a BasicType can be cached instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From yongzhou at openjdk.java.net Thu May 27 01:56:19 2021 From: yongzhou at openjdk.java.net (Yong Zhou) Date: Thu, 27 May 2021 01:56:19 GMT Subject: RFR: 8267686: C2: PrintIdealGraphFile supports parameterization [v2] In-Reply-To: References: Message-ID: > When analyzing C2 problems in jcstress[1], which starts multiple JVMs. If parsing %p%t is not supported, the file specified by PrintIdealGraphFile will be opened repeatedly. > > Example > > java -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=ideal-%p-%t.xml -XX:CompileCommand="print *jcstress::lambda$sanityCheck_Footprints$2" -jar jcstress.jar -c 64 -f 1 -iters 1 -t org.openjdk.jcstress.tests.locks.stamped.StampedLockPairwiseTests > > > Implemented by referring to `DumpLoadedClassList` > > [1] https://mail.openjdk.java.net/pipermail/jdk8u-dev/2021-January/013278.html Yong Zhou has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8267686: C2: PrintIdealGraphFile supports parameterization ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4201/files - new: https://git.openjdk.java.net/jdk/pull/4201/files/d1208609..7fa5dbce Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4201&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4201&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4201.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4201/head:pull/4201 PR: https://git.openjdk.java.net/jdk/pull/4201 From yongzhou at openjdk.java.net Thu May 27 06:36:29 2021 From: yongzhou at openjdk.java.net (Yong Zhou) Date: Thu, 27 May 2021 06:36:29 GMT Subject: RFR: 8267686: C2: PrintIdealGraphFile supports parameterization [v3] In-Reply-To: References: Message-ID: > When analyzing C2 problems in jcstress[1], which starts multiple JVMs. If parsing %p%t is not supported, the file specified by PrintIdealGraphFile will be opened repeatedly. > > Example > > java -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=ideal-%p-%t.xml -XX:CompileCommand="print *jcstress::lambda$sanityCheck_Footprints$2" -jar jcstress.jar -c 64 -f 1 -iters 1 -t org.openjdk.jcstress.tests.locks.stamped.StampedLockPairwiseTests > > > Implemented by referring to `DumpLoadedClassList` > > [1] https://mail.openjdk.java.net/pipermail/jdk8u-dev/2021-January/013278.html Yong Zhou has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8267686: C2: PrintIdealGraphFile supports parameterization ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4201/files - new: https://git.openjdk.java.net/jdk/pull/4201/files/7fa5dbce..7fa37048 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4201&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4201&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4201.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4201/head:pull/4201 PR: https://git.openjdk.java.net/jdk/pull/4201 From thartmann at openjdk.java.net Thu May 27 07:00:03 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 27 May 2021 07:00:03 GMT Subject: RFR: 8267800: Remove the '_dirty' set in BCEscapeAnalyzer In-Reply-To: References: Message-ID: On Wed, 26 May 2021 14:29:35 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this change? > > The content of `_dirty` in `BCEscapeAnalyzer` is only updated when processing `_aaload`. And it will not affect the results of the analysis because its content is never used. > IIUC, I think we should remove this set. > > Thanks, > Denghui Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4208 From roland at openjdk.java.net Thu May 27 08:23:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 27 May 2021 08:23:06 GMT Subject: RFR: 8267806: C1: Relax inlining checks for not yet initialized classes In-Reply-To: References: Message-ID: On Wed, 26 May 2021 17:41:33 GMT, Vladimir Ivanov wrote: > The checks which guide inlining decisions in C1 are too strong: declaring holder class is required to be fully initialized while JVMS only mandates an initialization barrier on resolved class in `invokestatic` case. > > The fix relaxes the checks to rule out only not yet linked classes unless it is an `invokestatic` call site. > > Testing: > - [x] hs-tier1 - hs-tier9 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4211 From roland at openjdk.java.net Thu May 27 08:27:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 27 May 2021 08:27:06 GMT Subject: RFR: 8267807: C2: Downcast receiver to target holder during inlining In-Reply-To: References: Message-ID: On Wed, 26 May 2021 17:58:52 GMT, Vladimir Ivanov wrote: > Virtual method calls involve an implicit subtype check against callee holder. > But if receiver type is too broad, it has to be narrowed before parsing the callee method. > Otherwise, it may cause problems during parsing and currently it simply blocks inlining. > > Proposed fix implements the narrowing step and re-enables inlining. > > Testing: > - [x] hs-tier1 - hs-tier9 src/hotspot/share/opto/doCall.cpp line 1138: > 1136: bool is_interface_holder = cha_monomorphic_target->holder()->is_interface(); > 1137: if (has_receiver && !is_interface_holder) { > 1138: if (!cha_monomorphic_target->holder()->is_subtype_of(receiver_type->klass())) { Given we can't trust interface types, shouldn't we test for receiver_type not an interface here? ------------- PR: https://git.openjdk.java.net/jdk/pull/4212 From ddong at openjdk.java.net Thu May 27 08:30:06 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Thu, 27 May 2021 08:30:06 GMT Subject: RFR: 8267800: Remove the '_dirty' set in BCEscapeAnalyzer In-Reply-To: References: Message-ID: On Thu, 27 May 2021 06:56:53 GMT, Tobias Hartmann wrote: >> Hi, >> >> Could I have a review of this change? >> >> The content of `_dirty` in `BCEscapeAnalyzer` is only updated when processing `_aaload`. And it will not affect the results of the analysis because its content is never used. >> IIUC, I think we should remove this set. >> >> Thanks, >> Denghui > > Looks good. Hi @TobiHartmann , It seems that your `/sponsor` didn't take effect, could you sponsor it again? Best, Denghui ------------- PR: https://git.openjdk.java.net/jdk/pull/4208 From bulasevich at openjdk.java.net Thu May 27 08:31:29 2021 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Thu, 27 May 2021 08:31:29 GMT Subject: RFR: 8267042: bug in monitor locking/unlocking on ARM32 C1 due to uninitialized BasicObjectLock::_displaced_header Message-ID: Hi, Could we have a review of this change? The bug was discovered in ARM32 C1 monitor locking/unlocking: the displaced header is not initialized to the non-zero value which results in deadlock. Thanks, Boris ------------- Commit messages: - test cleanup - adding reproducer to c1 jtreg tests - Fixing locking on ARM32 C1: initialize the displaced_header in C1_MacroAssembler::lock_object and native wrapper locking code Changes: https://git.openjdk.java.net/jdk/pull/4218/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4218&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267042 Stats: 176 lines in 3 files changed: 172 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/4218.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4218/head:pull/4218 PR: https://git.openjdk.java.net/jdk/pull/4218 From ddong at openjdk.java.net Thu May 27 08:40:11 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Thu, 27 May 2021 08:40:11 GMT Subject: Integrated: 8267800: Remove the '_dirty' set in BCEscapeAnalyzer In-Reply-To: References: Message-ID: On Wed, 26 May 2021 14:29:35 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this change? > > The content of `_dirty` in `BCEscapeAnalyzer` is only updated when processing `_aaload`. And it will not affect the results of the analysis because its content is never used. > IIUC, I think we should remove this set. > > Thanks, > Denghui This pull request has now been integrated. Changeset: 7278f56b Author: Denghui Dong Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/7278f56bb6345d7b023516d0f44de71cd74ff264 Stats: 9 lines in 2 files changed: 0 ins; 9 del; 0 mod 8267800: Remove the '_dirty' set in BCEscapeAnalyzer Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4208 From jbhateja at openjdk.java.net Thu May 27 08:43:33 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 27 May 2021 08:43:33 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v6] In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > > > > > > > > > > > > > > BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain > -- | -- | -- | -- | -- | -- | -- > ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 > ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 > ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 > ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 > ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 > ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 > ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 > ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 > ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 > ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 > ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 > ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 > ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 > ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 > ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 > ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 > ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 > ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 > ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 > ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 > ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 > ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 > ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 > ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 > ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 > ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 > ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 > ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 > ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 > ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 > ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 > ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 > ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 > ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 > ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 > ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 > ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 > ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 > ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 > ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 > ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 > ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 > ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 > ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 > ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 > ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 > ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 > ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 > ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 > ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 > ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 > ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 > ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 > ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 > ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 > ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 > ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 > ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 > ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 > ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 > ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 > ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 > ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 > ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 > ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 > ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 > ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 > ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 > ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 > ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266951: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3999/files - new: https://git.openjdk.java.net/jdk/pull/3999/files/4d7964b1..76ef9902 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=04-05 Stats: 16 lines in 4 files changed: 0 ins; 1 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Thu May 27 10:43:09 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 27 May 2021 10:43:09 GMT Subject: RFR: 8267807: C2: Downcast receiver to target holder during inlining In-Reply-To: References: Message-ID: On Thu, 27 May 2021 08:24:18 GMT, Roland Westrelin wrote: >> Virtual method calls involve an implicit subtype check against callee holder. >> But if receiver type is too broad, it has to be narrowed before parsing the callee method. >> Otherwise, it may cause problems during parsing and currently it simply blocks inlining. >> >> Proposed fix implements the narrowing step and re-enables inlining. >> >> Testing: >> - [x] hs-tier1 - hs-tier9 > > src/hotspot/share/opto/doCall.cpp line 1138: > >> 1136: bool is_interface_holder = cha_monomorphic_target->holder()->is_interface(); >> 1137: if (has_receiver && !is_interface_holder) { >> 1138: if (!cha_monomorphic_target->holder()->is_subtype_of(receiver_type->klass())) { > > Given we can't trust interface types, shouldn't we test for receiver_type not an interface here? Though it doesn't happen in practice now, we can trust interface types on the receiver object because virtual/interface method invocation implies proper type check of the receiver. ------------- PR: https://git.openjdk.java.net/jdk/pull/4212 From jptatton at amazon.com Thu May 27 12:08:32 2021 From: jptatton at amazon.com (Tatton, Jason) Date: Thu, 27 May 2021 12:08:32 +0000 Subject: RFC, C2 Partial Escape Analysis Message-ID: <6aa40c2fc14643c782382fe95308be3d@EX13D46EUB003.ant.amazon.com> RFC, C2 Partial Escape Analysis Hello everyone! We would like to invite discussion on a proposal for Partial Escape Analysis (PEA) to be added to the JVM C2 hotspot compiler. Specifically, we are looking to introduce a new compilation phase in order to enable optimizations such as scalar replacement in situations currently not encompassed by the existing control flow insensitive Escape Analysis (EA) phase of C2. PEA is a control flow sensitive variant of EA. We believe that by adding this compilation phase we will be able to improve compiled code performance and reduce heap size requirements by up to 10% and 8% respectively. At AWS, we have been prototyping a partial implementation of PEA with some success so far. We are at the stage now where we feel wider community involvement would be beneficial to the initiative. Background: What is Escape Analysis Escape analysis (EA) enables the HotSpot JIT C2 compiler to detect cases where objects do not "escape" the method/thread which created them. Whilst EA itself is not an optimization, the results of the analysis are used in order to enable a number of optimization techniques (as an alternative to conventional object allocation on the heap) including: * Stack allocation. Instead of allocating an object on the heap, it is instead allocated on the stack. In this way it is not considered eligible for garbage collection. This reduces the amount of work which must be done by garbage collectors, improving performance. C2 does not currently implement this optimization, though some other JVM implementations do such as IBM J9. * Scalar replacement. This goes a step further than stack allocation, here the object allocation can be eliminated altogether and all object field interaction isreplaced by local variables. Eliminating object allocation, field referencing and reducing load on the garbage collector all improve performance and reduce memory footprint. * Lock Elision. If a Java object monitor is used but the object but does not escape, then the lock can never be contented. In this case synchronization on said lock can be removed. Existing Limitations The current EA implementation is flow insensitive. Let's examine what this means in practice... Here is an example of an object which does not escape the bounds of the method/thread creating it and is therefore considered non escaping and eligible for optimization. The current implementation of EA in C2 detects this case public class Pair{ public int a; public int b; public Pair(int a, int b){ this.a = a; this.b = b; } } public static int foo(int a, int b){ Pair mypair = new Pair(a, b); return mypair.a; } and will enable scalar replacement such that the above code in foo can be transformed into the following form, which saves us from an instance of object allocation of Pair on every invocation of foo. public class Pair{ public int a; public int b; public Pair(int a, int b){ this.a = a; this.b = b; } } public static int foo(int a, int b){ // Pair mypair = new Pair(12, 13); return a; // don't bother with object allocation of Pair } However, EA is control flow insensitive and will not enable scalar replacement of fields of an object where there is branching code with a path having the possibility of the object escaping the lifetime of the thread/method which created it, regardless of how unlikely that branch is to execute at runtime... For example: public static int foo(){ Pair mypair = new Pair(12, 13); if(something()){ // maybe this branch is hit only 0.01% of the time return usesAndPersistsObject(mypair); // escapes here }else{ return mypair.a; // no scalar replacement here } } Above we see that in one branch the Pair instance object mypair escapes, therefore the mypair instance object is considered 'globally' escaping and must always be allocated on the heap. This means that scalar replacement cannot be used even when the escaping branch is only entered into very infrequently. PEA aims to largely solve this problem. Partial Escape Analysis Partial Escape Analysis (PEA) is a control flow sensitive variant of EA. Like EA it acts upon the ideal graph IR. With PEA we propose introducing the concept of a differed object allocation (DOA). We propose considering only objects which have been identified as escaping in the EA phase of compilation. With PEA we propose to defer object allocation of those objects to the latest point in program/branch execution before which that object has been identified as escaping. Scalar replacement and lock elision can then be applied up to that point of deferred object allocation. Sometimes the latest point for a particular object may resolve to be at the point of initial object allocation, in which case PEA will offer no benefit. High level algorithm We are interested in only a subset of ideal graph nodes. Specifically nodes related to escaping objects: * Allocation(A): Allocation and Initialization nodes. * Branch(B): We are interested in branching nodes; IfNode, IfTrue, IfFalse nodes and phi nodes (e.g. to capture the case of: staticfield = condition?obj1:obj2 as well as loops, jumps related to break/continue etc) . * Escape(E): Nodes concerning: object assigned to static field, object used as parameter in method invocation * Usage(U): Nodes concerning: store to object field, loading from object field. * Return(R): Behave similarly to Escape nodes, Return/ReThrow nodes. We propose PEA to operate in via the following steps: 1. Graph reduction. This stage visits every node in the ideal graph and for the aforementioned nodes of interest, produces a reduced graph structure (example below) consisting of a reduced control flow sensitive set of nodes. The non-branching nodes of this reduced graph form are structured into basic blocks with IfTrue/IfFalse nodes used to denote the branches. The subset of nodes is used in step 2 and 3 below. 2. Differed Object Allocation Algorithm. This stage figures out where opportunities for DOA exist for each allocated object that can escape. It determines the very last point in which an object can be differed allocated, aka materialized on the heap in a branch so as to maintain program semantics by considering branches and object usage. There may be more than one branch path resulting in a materialization of an allocated object. There may also be instances where the object is required to be in a materialized state before a branch node has been encountered - in this case then no further optimization can be enabled by PEA and subsequent processing can be skipped. 3. Graph transformation. After applying DOA in step 2 we know the latest points at which an object may be materialized on the heap. With this knowledge all usage nodes prior to these points can be scalar replaced and all usage/escaping nodes after a materialization of a node need to be repointed to the nodes resulting from the materializations. Some repointing of the internal state of differed objects on the heap may be necessary at the point of materialization so as to maintain program semantics. We must also ensure consistency in the case of deoptimization. Throughout all these mini phases, in the case of potential failure, the algorithm will skip subsequent processing. Example Let us now examine a case where PEA works well: 0: public static boolean RAND = true; // set externally 1: public static boolean RAND2 = true; // set externally 2: 3: public static int infrequentBranch(int f1, int f2) { 4: Pair intPair = new Pair(f1, f2); 5: 6: if (RAND) { // external call sets this to true 10% of the time 7: intPair.a += 1; // example 'Use' of intPair 8: Global.lastThing = intPair; // escapes here 9: return 1; 10: } 11: 12: // scalar replacable usage... 13: intPair.a += 12; 14: 15: if (RAND2) { // external call sets this to true 10% of the time 16: intPair.a += 10; // example 'Use' of intPair 17: Global.lastThing = intPair; // escapes here 18: return 2; 19: } 20: 21: // if we get to this point there should have been 22: // no object allocation (with PEA enabled) 23: return 3; 24: } Above is an example of code containing two branching paths in which the intPair object escapes. If execution of the above code gets to the end of the method, with PEA enabled, intPair will not be allocated on the heap. Let us now look at the reduced graph of the above method: 4: A:Initialize[40] 6: B:If[110] -> IfTrue: (bb: 1) 7: U:StoreI[121] of: Initialize[40] 8: E:StoreN[126] of: Initialize[40] 9: R:Return[130] -> IfFalse: (bb: 2) X 13: U:StoreI[147] of: Initialize[40] 15: B:If[158] -> IfTrue: (bb: 3) 16: U:StoreI[170] of: Initialize[40] 17: E:StoreN[173] of: Initialize[40] 18: R:Return[180] -> IfFalse: (bb: 4) X 23: R:Return[194] The reduced graph is composed of a root basic block (bb) (with id of 0) and four child bb's (1,2,3,4) corresponding to the code in the two if statements. Each if statement generates an IfTrue and IfFalse node each having a basic block associated. For readability, line number information is provided above (our prototype, discussed shortly, also includes bci information). The graph transformation mini phase is interested in the results of opportunities for differed allocation introduced by the DOA algorithm. In this example, that corresponds to the Initialize[40] node seen at the root bb. Observe that there are two DOA opportunities in each of the IfTrue branches at lines 8 and 17. Since both of these occur within branches which result in early termination of the method, the U:StoreI[147] node (corresponding to: intPair.a += 12;) may be scalar replaced. The nodes: U:StoreI[121] and U:StoreI[170] may optionally be scalar replaced, but there is little benefit because soon after the they are referenced the object which they reference will be materialized onto the heap. The DOA algorithm of PEA is able to able to handle more complex cases, for example say we have the following code: 0: public static boolean RAND = true; // set externally 1: public static boolean RAND2 = true; // set externally 2: 3: public static int reallyInfrequentBranch(int f1, int f2) { 4: Pair intPair = new Pair(f1, f2); 5: 6: if (RAND) { // external call sets this to true 1% of the time 7: if (RAND2) { // external call sets this to true 1% of the time 8: Global.lastThing = intPair; // escapes here 9: unkownCall(); // non inlined call - maybe call triggering separate thread interacting with the state of Global.lastThing 10: intPair.a += 12; // operates on materialized object 11: } 12: } 13: 16: return intPair.a; 17: } At first glance it would appear that the above code would benefit from PEA since the escape of intPair occurs within not just one but two infrequently branching nodes. However, PEA cannot be used here because when the intPair object escapes at line 8, in this example the the unkownCall() is not inlined and maybe performs an operation on the state of IntPair (maybe in a separate thread), as such to maintain program semantics the object needs to be in a materialized state at the point of initial declaration (i.e. created on the heap as normal). As far as the DOA algorithm is concerned for the above code, when it is processing the bb associated with the If statement on line 7 it will first mark the intPair object as escaping and materialized within the subsequent context of the branch, thus rendering the call on line 10 as not scalar replaceable. When DOA has finished processing this bb and moved on to the root bb it will come to the return statement on line 16, since the intPair is marked as being in a escaped and previously materialized state within a previous nested branch bb, this materialization will be "promoted" up to the root bb, which also just happens to be the location of the initial allocation thus eliminating this as an opportunity for PEA. Progress at AWS So far in our internal prototype, in the interests of failing fast, we have implemented up to step 2 of the outlined algorithm and are currently evaluating performance of a typical workload in which differed object allocation can be applied in an attempt to estimate the typical improvement in performance. Our initial prototype is implemented as a separate compilation phase which takes place after conventional, flow insensitive EA. Flags In support of this compilation phase we propose adding three JVM flags: Flag Notes Default Notes DoPartialEscapeAnalysis perform partial escape analysis true global switch (set to false in preview build) PartialEALogLevel PEA log level 0 - nothing [0-5] more detail per log level - including the option to output the reduced graph representation above in a human readable format. this aids development and produciton debugging PartialEAOnly Restrict pea to only this method '' - empty string Equivalent to setting DoPartialEscapeAnalysis to true Challenges * Generally speaking, operating upon the ideal graph representation is challenging as it is a paradigm unique to the JVM and documentation is sparing. Tools such as the ideal graph visualizer are however excellent for improving engineer productivity when interacting with the ideal graph. * When speaking in terms of this initiative, the most challenging aspect we have faced so far is in building reduced graph representations of complex phi node interactions, e.g. code such as: (RAND?holder1:holder2).held = RAND2?intPair1:intPair2; . We expect the majority of bugs in step 2 of our outlined algorithm to reside in this space. * Another challenging area is that of "deadly embraces" where two objects have fields which point to one another and at least one of those objects is marked as escaping thus so rendering the other. For instance: object1.friend = object2; object2.friend = object1; We have not determined how to solve this problem yet but some mechanism to detect these cyclic relationships is required as we believe that these relationships are common in many of the java.util.* classes. Impact We have seen that PEA can have a positive impact in JVM's,. The implementation in the GraalVM compiler is reported as having a positive benefit on improving performance and memory allocated in standardized benchmarks by 10% and 8% respectfully. For some benchmarks, the improvement in performance and memory allocated can be up to 58.5% and 33% respectively. Risks * Increased C2 compilation time. Care must be applied to ensure an optimal implementation of this compilation phase so as to not impact JVM startup and code generation times. * The additional graph transformation adds complexity to the idea graph and can ultimately result in larger and more complex generated machine code. Alternatives As an alternative to PEA, effort could be invested in making the current EA implementation of C2 be control flow sensitive. However, we estimate that this would in practice look very much like a separate compilation phase anyway, as such we recommend, at least initially, implementing PEA as a separate phase, and considering merging if it is proven to be successful. Relation to other projects * As with many other compiler optimizations, operating upon a larger portion of a program can improve efficiency of the optimization. As such we envisage that improvements in inlining will have a direct benefit on opportunities to introduce differed object allocation. * If stack allocation is to be introduced to the JVM then PEA should be able to productively interact with this optimization. * We do not envisage any negative impact upon project Loom. Further work We believe that the scheme presented here is relatively conservative and should have the aforementioned positive impact. One extension to this work would be the introduction of a fast/slow path to be used in locations where scalar replacement would be possible on fields of objects which are very infrequently escaped (such as in the reallyInfrequentBranch example above). This comes with two challenges however: 1). being additional size/complexity of code generated and 2). the slow/fast path logic itself will have a performance impact so this needs to be used in moderation - perhaps applying this approach for cases where the escaping probability is low (<10%) and monitoring this using ongoing branch entry profiling information would be a reasonable tradeoff. Comments welcome! -Jason and Xin From roland at openjdk.java.net Thu May 27 15:49:07 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 27 May 2021 15:49:07 GMT Subject: RFR: 8267807: C2: Downcast receiver to target holder during inlining In-Reply-To: References: Message-ID: On Wed, 26 May 2021 17:58:52 GMT, Vladimir Ivanov wrote: > Virtual method calls involve an implicit subtype check against callee holder. > But if receiver type is too broad, it has to be narrowed before parsing the callee method. > Otherwise, it may cause problems during parsing and currently it simply blocks inlining. > > Proposed fix implements the narrowing step and re-enables inlining. > > Testing: > - [x] hs-tier1 - hs-tier9 Looks good to me ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4212 From jcm at openjdk.java.net Thu May 27 21:05:33 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Thu, 27 May 2021 21:05:33 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" [v4] In-Reply-To: References: Message-ID: <-fR7HyVi3oDR3P-pYQGTVfzR6BnGb9YnJiqOi5vkEZc=.817580a6-c78b-4c99-ba66-575ada3a19d6@github.com> > Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 > but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 > > Request for review Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: improving the fix for raise_LCA_above_marks ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4200/files - new: https://git.openjdk.java.net/jdk/pull/4200/files/1a390c7d..080d2270 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4200&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4200&range=02-03 Stats: 29 lines in 2 files changed: 25 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/4200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4200/head:pull/4200 PR: https://git.openjdk.java.net/jdk/pull/4200 From jcm at openjdk.java.net Thu May 27 21:09:06 2021 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Thu, 27 May 2021 21:09:06 GMT Subject: RFR: 8265132 : C2 compilation fails with assert "missing precedence edge" [v3] In-Reply-To: References: Message-ID: On Wed, 26 May 2021 18:46:59 GMT, Jamsheed Mohammed C M wrote: >> Issue is similar to https://bugs.openjdk.java.net/browse/JDK-8261730 >> but happens at next https://github.com/jamsheedcm/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L830 >> >> Request for review > > Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: > > fixing issue related to raising LCA I have improved the fix by clearing raise_LCA_visted list at end of `insert_ant_dependences` full changes here https://github.com/openjdk/jdk/pull/4200/files ------------- PR: https://git.openjdk.java.net/jdk/pull/4200 From ksakata at openjdk.java.net Fri May 28 05:36:05 2021 From: ksakata at openjdk.java.net (Koichi Sakata) Date: Fri, 28 May 2021 05:36:05 GMT Subject: RFR: 8260360: IGV: Short name of combined nodes is hidden by background color In-Reply-To: References: Message-ID: <3s1GcSlPO0qaspiSfzdv0DZvcR77YCzwcf90X1Kuv2E=.f5b0c1f5-6d6d-4891-a64f-86549a503633@github.com> On Tue, 18 May 2021 07:33:01 GMT, Koichi Sakata wrote: > This pull request enables the short name of combined nodes readable. > > At present those node are painted out with black because their OutputSlot color is null. So this pull request sets the original color to the OutputSlot. > > I tested the following scenario manually: > > - Open a graph, then enable "Simplify graph" (as described in the bug report). The result is the following images. There are two graphs. One is black and white only graph, the other is colored graph. > > ????????? 2021-05-18 16 25 03 > ????????? 2021-05-18 16 25 33 Thank you for reviewing, Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/4082 From hshi at openjdk.java.net Fri May 28 06:25:25 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Fri, 28 May 2021 06:25:25 GMT Subject: RFR: 8267904: C2 crash when compile negative Arrays.copyOf length after loop Message-ID: C2 crash when Arrays.copyOf has a negative length after a loop. This happens in release and debug build. Test and hs_err are in JBS. Crash reason is: - CastIINode is created in GraphKit::new_array (in AllocateArrayNode::make_ideal_length), Cast array lenght to range [0, maxint-2]. This is safe it allocation is success and CastIINode 's input control is InitializeNode's proj control. - In LibraryCallKit::inline_arraycopy, InitializeNode's proj control's use nodes' control is replaced with AllocateArrayNode's input control (in LibraryCallKit::arraycopy_move_allocation_here). This is necessary to move allocation after array copy checks. C->gvn_replace_by(init->proj_out(TypeFunc::Control), alloc->in(0)); - CastIINode's control is also adjust to AllocateArrayNode's input control, which is illegal state in laster IGVN phase, casting a negative to [0, maxint-2]. - This cause control and nodes after loop become top and removed. The previous loop has no fall-through edge and crash. Fix is: - In LibraryCallKit::inline_arraycopy entry, if tightly coupled AllocateArrayNode is found, replace its CastIINode with original array length. - In LibraryCallKit::arraycopy_move_allocation_here, recreate CastIINode if necessary. - In LibraryCallKit::inline_arraycopy entry, avoid invoking AllocateArrayNode::make_ideal_length when getting "tightly coupled AllocateArrayNode"'s length. This avoids creating incorrect CastIINode again. Before fix: node 250 is CastII which should be after InitializeNode. ![image](https://user-images.githubusercontent.com/70356247/119938428-f7fa4e80-bfbe-11eb-925e-c239620c73f3.png) After fix: all arry copy check is performed on original array length node 203 ![image](https://user-images.githubusercontent.com/70356247/119938532-2415cf80-bfbf-11eb-98c6-76e6b19b691f.png) New test test/hotspot/jtreg/compiler/c2/TestNegativeArrayCopyAfterLoop.java is added and pass. Tests performs on Linux X64 and no regression - Tier1/2/3/hotspot_all_no_apps on release and fastdebug build. - Tier1/2/3 with option "-XX:-TieredCompilation -Xbatch" on fastdebug build ------------- Commit messages: - 8267904: C2 crash when compile negative Arrays.copyOf length after loop Changes: https://git.openjdk.java.net/jdk/pull/4238/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4238&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267904 Stats: 123 lines in 5 files changed: 116 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4238.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4238/head:pull/4238 PR: https://git.openjdk.java.net/jdk/pull/4238 From ksakata at openjdk.java.net Fri May 28 06:36:03 2021 From: ksakata at openjdk.java.net (Koichi Sakata) Date: Fri, 28 May 2021 06:36:03 GMT Subject: RFR: 8263385: IGV: Graph is not opened in the window that has focus. In-Reply-To: References: Message-ID: On Tue, 18 May 2021 07:16:30 GMT, Richard Reingruber wrote: >> This pull request enables IGV opens a graph in the window that is focused. >> >> At the moment IGV opens a graph in the window that has the graph and is found first. So in this pull request I used preferentially the active EditorTopComponent. >> >> I tested the following scenarios manually: >> >> 1. Open a graph, open clone, then open another graph (as described in the bug report). It replaces the clone graph with the last opened graph. >> 2. Open a graph, open clone, swap tabs by dragging the clone graph, then open another graph. It replaces the clone graph with the last opened graph. >> 3. Open a graph, open clone, change the focus from the clone graph to the first graph, then open another graph. It replaces the first graph with the last opened graph. >> 4. Open a graph, open clone, open the same graph xml file from the toolbar, open a graph in the second folder, then open a graph in the first folder. It replaces the leftmost graph that was opened the first with the last opened graph. > > Hello Koichi, > > thanks for taking care of this issue. > > I've built and tested this pull request and found that it works in most cases. > > Here's what did not work: > > 1. Open Graph -> new Tab T1 is created > 2. Open Clone -> new Tab T2 is created > 3. Use the mouse to drag T2 down in the lower part of the window until the red frame indicates that the window will be split horizontally -> The window will be split horizontally. T2 is the lower window and has focus. > 4. Open another Graph in the outline > 5. IGV shows that graph in T1 even though T2 had focus. This is unexpected. > > Despite that I think your change is good. > > Unfortunately I can only test but not review the change itself as I am not familiar with IGV source code. > > Thanks, Richard. @reinrich That doesn?t seem doable. I've analyzed the code. When splitting graph windows, each window has its own mode, model and sub model that are classes in OpenIDE and NetBeans. So when splitting horizontally, there are two mode objects. It needs to get the last active editor mode to show a selected graph. The `org.netbeans.core.windows.model.ModesSubModel` class has the property named `lastActiveEditorMode`. This is the mode object that is last active mode and is an editor mode. But `lastActiveEditorMode` can be accessed only from those NetBeans internal classes such as `org.netbeans.core.windows.Central` or `org.netbeans.core.windows.model.Model`. Therefore, it doesn't seem to be able to get last active editor mode from the classes of IGV and show a graph in the window of the mode, I think. Since this pull request can't satisfy the needs, I intend to close this PR. https://github.com/apache/netbeans/blob/39496d3400eada7398a8428f05c90589aa1f3b74/platform/core.windows/src/org/netbeans/core/windows/model/ModesSubModel.java#L303 ------------- PR: https://git.openjdk.java.net/jdk/pull/4078 From whuang at openjdk.java.net Fri May 28 06:59:28 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 28 May 2021 06:59:28 GMT Subject: RFR: 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend Message-ID: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> Reason: operand pRegGov() %{ constraint(ALLOC_IN_RC(gov_pr)); match(RegVectMask); op_cost(0); format %{ %} interface(REG_INTER); %} if `pRegGov` is used as a `TEMP`, like : instruct insertB_small(vReg dst, vReg src, iRegIorL2I val, immI idx, pRegGov pTmp, rFlagsReg cr) %{ predicate(UseSVE > 0 && n->as_Vector()->length() <= 32 && n->bottom_type()->is_vect()->element_basic_type() == T_BYTE); match(Set dst (VectorInsert (Binary src val) idx)); effect(TEMP_DEF dst, TEMP pTmp, KILL cr); // here It will have the type `Type::VectorMask` in `MachTempNode` which is generated from `Expand`. However, we miss `Type::VectorMask` in `Type::category()`. We can fix this bug simply by adding `case Type::VectorMask` in `Type::category()`. Although now we can only reproduce this bug on AArch64, it should be added for all platforms with predicate support. ------------- Commit messages: - Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend Changes: https://git.openjdk.java.net/jdk/pull/4239/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4239&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267375 Stats: 65 lines in 2 files changed: 65 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4239.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4239/head:pull/4239 PR: https://git.openjdk.java.net/jdk/pull/4239 From whuang at openjdk.java.net Fri May 28 07:32:26 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 28 May 2021 07:32:26 GMT Subject: RFR: 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend [v2] In-Reply-To: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> References: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> Message-ID: > Reason: > > > operand pRegGov() > %{ > constraint(ALLOC_IN_RC(gov_pr)); > match(RegVectMask); > op_cost(0); > format %{ %} > interface(REG_INTER); > %} > > if `pRegGov` is used as a `TEMP`, like : > > > instruct insertB_small(vReg dst, vReg src, iRegIorL2I val, immI idx, pRegGov pTmp, rFlagsReg cr) > %{ > predicate(UseSVE > 0 && n->as_Vector()->length() <= 32 && > n->bottom_type()->is_vect()->element_basic_type() == T_BYTE); > match(Set dst (VectorInsert (Binary src val) idx)); > effect(TEMP_DEF dst, TEMP pTmp, KILL cr); // here > > > It will have the type `Type::VectorMask` in `MachTempNode` which is generated from `Expand`. However, we miss `Type::VectorMask` in `Type::category()`. > > We can fix this bug simply by adding `case Type::VectorMask` in `Type::category()`. > > Although now we can only reproduce this bug on AArch64, it should be added for all platforms with predicate support. Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4239/files - new: https://git.openjdk.java.net/jdk/pull/4239/files/49269057..70694584 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4239&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4239&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4239.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4239/head:pull/4239 PR: https://git.openjdk.java.net/jdk/pull/4239 From aph at openjdk.java.net Fri May 28 09:44:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 28 May 2021 09:44:06 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v6] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Thu, 27 May 2021 08:43:33 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Review comments resolution. src/hotspot/cpu/x86/x86.ad line 8074: > 8072: %} > 8073: ins_pipe( pipe_slow ); > 8074: %} The semantics of `VectorCmpMasked` seem to be defined nowhere. It doesn't much matter where you put the comment, perhaps on the definition of the node itself, but a reader shouldn't have to study the code to figure out what `VectorCmpMasked` is supposed to do. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From neliasso at openjdk.java.net Fri May 28 10:02:06 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 28 May 2021 10:02:06 GMT Subject: RFR: 8267726: ZGC: array_copy_requires_gc_barriers too strict In-Reply-To: References: Message-ID: <0uD1qfMdOxceleS5uPQQNMFzwTQijCJ9p7EJnvg7bxw=.31e2eded-a7f6-40b3-b6e3-5f8f8755cc2c@github.com> On Thu, 27 May 2021 19:59:59 GMT, Nils Eliasson wrote: > I found some cases where an arraycopy clone is eliminated with G1 but not with ZGC. This is probably something that wasn't updated fully after the transition to late gc barrier insertion. > > During parse and optimizaton phases array_copy_requires_gc_barriers should return false for clones of oop-arrays. Clone of oop-arrays should be treated the same way as clones of primitive-arrays. During optimization phase - only clones of instances should return true - and that's because they can't be reduced to a raw bulk copy, Clones of instances must either become deconstructed into field copies, or be handled in a special call. > > During expansion array_copy_requires_gc_barriers must return true - because we must use a copy with barriers. > > To fix this I had to add an extra field to array_copy_requires_gc_barriers to be able to handle instance clones separately. I will follow up with a cleanup. The intersection of arraycopy-kinds and array_copy_requires_gc_barriers-method is the source of much unnecessary complexity. > > Please review, > Best regards, > Nils Eliasson Moved to hotspot-compiler ------------- PR: https://git.openjdk.java.net/jdk/pull/4230 From thartmann at openjdk.java.net Fri May 28 12:58:05 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 28 May 2021 12:58:05 GMT Subject: RFR: 8267806: C1: Relax inlining checks for not yet initialized classes In-Reply-To: References: Message-ID: On Wed, 26 May 2021 17:41:33 GMT, Vladimir Ivanov wrote: > The checks which guide inlining decisions in C1 are too strong: declaring holder class is required to be fully initialized while JVMS only mandates an initialization barrier on resolved class in `invokestatic` case. > > The fix relaxes the checks to rule out only not yet linked classes unless it is an `invokestatic` call site. > > Testing: > - [x] hs-tier1 - hs-tier9 Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4211 From yyang at openjdk.java.net Fri May 28 13:35:15 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 28 May 2021 13:35:15 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate Message-ID: Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). ------------- Commit messages: - compute_trip_count Changes: https://git.openjdk.java.net/jdk/pull/4247/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4247&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267928 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4247/head:pull/4247 PR: https://git.openjdk.java.net/jdk/pull/4247 From aph at openjdk.java.net Fri May 28 13:54:08 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 28 May 2021 13:54:08 GMT Subject: RFR: 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend [v2] In-Reply-To: References: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> Message-ID: On Fri, 28 May 2021 07:32:26 GMT, Wang Huang wrote: >> Reason: >> >> >> operand pRegGov() >> %{ >> constraint(ALLOC_IN_RC(gov_pr)); >> match(RegVectMask); >> op_cost(0); >> format %{ %} >> interface(REG_INTER); >> %} >> >> if `pRegGov` is used as a `TEMP`, like : >> >> >> instruct insertB_small(vReg dst, vReg src, iRegIorL2I val, immI idx, pRegGov pTmp, rFlagsReg cr) >> %{ >> predicate(UseSVE > 0 && n->as_Vector()->length() <= 32 && >> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE); >> match(Set dst (VectorInsert (Binary src val) idx)); >> effect(TEMP_DEF dst, TEMP pTmp, KILL cr); // here >> >> >> It will have the type `Type::VectorMask` in `MachTempNode` which is generated from `Expand`. However, we miss `Type::VectorMask` in `Type::category()`. >> >> We can fix this bug simply by adding `case Type::VectorMask` in `Type::category()`. >> >> Although now we can only reproduce this bug on AArch64, it should be added for all platforms with predicate support. > > Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4239 From neliasso at openjdk.java.net Fri May 28 21:08:42 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 28 May 2021 21:08:42 GMT Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly Message-ID: Hi, PhaseStringOpts::int_stringSize doesn't handle min_jint correctly - it can't be negated to a positive number. I fix this adding a special case for this. Added a test that was contributed by Adam who reported the bug. Will add as a contributor. Please review, Best regards, Nils Eliasson ------------- Commit messages: - Fix overflow in strconcat Changes: https://git.openjdk.java.net/jdk/pull/4255/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4255&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267773 Stats: 75 lines in 2 files changed: 75 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4255.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4255/head:pull/4255 PR: https://git.openjdk.java.net/jdk/pull/4255 From dongbo at openjdk.java.net Sat May 29 02:44:28 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sat, 29 May 2021 02:44:28 GMT Subject: RFR: 8267616: AArch64: Fix AES assertion messages in stubGenerator_aarch64.cpp Message-ID: Trivial fix for the messages of `assert(UseAES, "need AES instructions and misaligned SSE support")` in stubGenerator_aarch64.cpp. The SSE instruction set is an extension to the x86 architecture. On aarch64, AES cryptographic extension are used for these intrinsics. ------------- Commit messages: - 8267616: AArch64: Fix AES assertion messages in stubGenerator_aarch64.cpp Changes: https://git.openjdk.java.net/jdk/pull/4259/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4259&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267616 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/4259.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4259/head:pull/4259 PR: https://git.openjdk.java.net/jdk/pull/4259 From whuang at openjdk.java.net Sat May 29 03:28:23 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 29 May 2021 03:28:23 GMT Subject: RFR: 8265907: JVM crashes when matching VectorMaskCmp Node [v3] In-Reply-To: <_otBn8mWHgqdIjnZ6yiBu_T6F5076C_aKL7yMHe-sHo=.d94a0510-0e91-4286-814c-5b568d7ce923@github.com> References: <_otBn8mWHgqdIjnZ6yiBu_T6F5076C_aKL7yMHe-sHo=.d94a0510-0e91-4286-814c-5b568d7ce923@github.com> Message-ID: On Mon, 26 Apr 2021 07:47:05 GMT, Wang Huang wrote: >> * fix the issue JDK-8265907 >> * all archs might be effected by this bug. I fixed x86 and aarch64. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix x86's bug @iwanowww @mgkwill Could you do me a favor to review this issue? Any suggestion is welcome. ------------- PR: https://git.openjdk.java.net/jdk/pull/3670 From mli at openjdk.java.net Sat May 29 06:08:31 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Sat, 29 May 2021 06:08:31 GMT Subject: RFR: JDK-8267930: Refine code for loading hsdis library Message-ID: code for loading hsdis library is redundant, this is to simplify it. ------------- Commit messages: - JDK-8267930: Refine code for loading hsdis library Changes: https://git.openjdk.java.net/jdk/pull/4260/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4260&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267930 Stats: 42 lines in 2 files changed: 12 ins; 25 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/4260.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4260/head:pull/4260 PR: https://git.openjdk.java.net/jdk/pull/4260 From whuang at openjdk.java.net Sat May 29 06:42:18 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 29 May 2021 06:42:18 GMT Subject: RFR: JDK-8267930: Refine code for loading hsdis library In-Reply-To: References: Message-ID: On Sat, 29 May 2021 05:59:45 GMT, Hamlin Li wrote: > code for loading hsdis library is redundant, this is to simplify it. Looks good to me ------------- Marked as reviewed by whuang (Author). PR: https://git.openjdk.java.net/jdk/pull/4260 From aph at openjdk.java.net Sat May 29 08:53:25 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 29 May 2021 08:53:25 GMT Subject: RFR: 8267616: AArch64: Fix AES assertion messages in stubGenerator_aarch64.cpp In-Reply-To: References: Message-ID: On Sat, 29 May 2021 02:37:26 GMT, Dong Bo wrote: > Trivial fix for the messages of `assert(UseAES, "need AES instructions and misaligned SSE support")` in stubGenerator_aarch64.cpp. > The SSE instruction set is an extension to the x86 architecture. On aarch64, AES cryptographic extension are used for these intrinsics. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4259 From hshi at openjdk.java.net Sun May 30 00:50:41 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sun, 30 May 2021 00:50:41 GMT Subject: RFR: 8267904: C2 crash when compile negative Arrays.copyOf length after loop [v2] In-Reply-To: References: Message-ID: > C2 crash when Arrays.copyOf has a negative length after a loop. This happens in release and debug build. Test and hs_err are in JBS. > > Crash reason is: > - CastIINode is created in GraphKit::new_array (in AllocateArrayNode::make_ideal_length), Cast array lenght to range [0, maxint-2]. This is safe when allocation is success and CastIINode 's input control is InitializeNode's proj control. > - In LibraryCallKit::inline_arraycopy, InitializeNode's proj control's use nodes' control is replaced with AllocateArrayNode's input control (in LibraryCallKit::arraycopy_move_allocation_here). This is necessary to move allocation after array copy checks. But this also includes CastIINode. > > C->gvn_replace_by(init->proj_out(TypeFunc::Control), alloc->in(0)); > > - CastIINode's control is also adjust to AllocateArrayNode's input control, which is illegal state in laster IGVN phase, casting a negative to [0, maxint-2]. > - This cause control and nodes after loop become top and removed. The previous loop has no fall-through edge and crash. > > Fix is: > - In LibraryCallKit::inline_arraycopy entry, if tightly coupled AllocateArrayNode is found, replace its CastIINode with original array length. > - In LibraryCallKit::arraycopy_move_allocation_here, recreate CastIINode if necessary. > - In LibraryCallKit::inline_arraycopy entry, avoid invoking AllocateArrayNode::make_ideal_length when getting "tightly coupled AllocateArrayNode"'s length. This avoids creating incorrect CastIINode again. > > Before fix: node 250 is CastII which should be after InitializeNode. > ![image](https://user-images.githubusercontent.com/70356247/119938428-f7fa4e80-bfbe-11eb-925e-c239620c73f3.png) > > After fix: all arry copy check is performed on original array length node 203 > ![image](https://user-images.githubusercontent.com/70356247/119938532-2415cf80-bfbf-11eb-98c6-76e6b19b691f.png) > > New test test/hotspot/jtreg/compiler/c2/TestNegativeArrayCopyAfterLoop.java is added and pass. > Tests performs on Linux X64 and no regression > - Tier1/2/3/hotspot_all_no_apps on release and fastdebug build. > - Tier1/2/3 with option "-XX:-TieredCompilation -Xbatch" on fastdebug build Hui Shi has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8267904: C2 crash when compile negative Arrays.copyOf length after loop ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4238/files - new: https://git.openjdk.java.net/jdk/pull/4238/files/4ee111d2..05d2ac2f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4238&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4238&range=00-01 Stats: 143 lines in 4 files changed: 34 ins; 103 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/4238.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4238/head:pull/4238 PR: https://git.openjdk.java.net/jdk/pull/4238 From jbhateja at openjdk.java.net Sun May 30 18:37:09 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 30 May 2021 18:37:09 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v7] In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > > > > > > > > > > > > > > BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain > -- | -- | -- | -- | -- | -- | -- > ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 > ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 > ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 > ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 > ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 > ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 > ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 > ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 > ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 > ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 > ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 > ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 > ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 > ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 > ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 > ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 > ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 > ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 > ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 > ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 > ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 > ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 > ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 > ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 > ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 > ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 > ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 > ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 > ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 > ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 > ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 > ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 > ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 > ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 > ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 > ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 > ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 > ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 > ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 > ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 > ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 > ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 > ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 > ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 > ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 > ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 > ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 > ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 > ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 > ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 > ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 > ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 > ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 > ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 > ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 > ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 > ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 > ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 > ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 > ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 > ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 > ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 > ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 > ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 > ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 > ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 > ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 > ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 > ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 > ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266951: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3999/files - new: https://git.openjdk.java.net/jdk/pull/3999/files/76ef9902..bddc6051 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=05-06 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Sun May 30 18:37:13 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 30 May 2021 18:37:13 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v6] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: On Fri, 28 May 2021 09:41:06 GMT, Andrew Haley wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266951: Review comments resolution. > > src/hotspot/cpu/x86/x86.ad line 8074: > >> 8072: %} >> 8073: ins_pipe( pipe_slow ); >> 8074: %} > > The semantics of `VectorCmpMasked` seem to be defined nowhere. It doesn't much matter where you put the comment, perhaps on the definition of the node itself, but a reader shouldn't have to study the code to figure out what `VectorCmpMasked` is supposed to do. Hi @theRealAph , name VectorCmpMaskedNode is on the lines of other masked operations LoadVectorMaskedNode/StoreVectorMaskedNode. Nomenclature used implicitly describe the semantics i.e. MaskedNode. Appropriate comments appended. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From dholmes at openjdk.java.net Mon May 31 01:29:03 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 31 May 2021 01:29:03 GMT Subject: RFR: 8266950: Remove vestigial support for non-strict floating-point execution [v7] In-Reply-To: References: Message-ID: > As part of JEP 306, the vestiges of HotSpot support for non-strict floating-point execution can be removed. All methods implicitly have strictfp semantics so the explicit checks for is_strict() can be replaced by true and the code reformulated accordingly. > > There are still some names that include "strict" that could potentially be renamed to remove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. > > Testing: tiers 1-3 > > Thanks, > David David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge branch 'master' into jep-306 - Remove dead code on aarch64 - Merge branch 'master' into jep-306 - The code for strict handling only applies to doubles. - Add missing space - lir_div_strictfp and lir_mul_strictfp - Removed divDPR_reg_round as it has a false predicate and so is now unused - Revert classFileParser changes as they will be handled by JDK-8266530 - 8266530: HotSpot changes for JEP 306 All methods are now implicitly strictfp so all code generation etc uses the strict form. There are still some names that include "strict" that could potentially be renamed to rmeove it, but the fact we have to have strict fp semantics is still important on some platforms, so the names help reinforce that IMO. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3991/files - new: https://git.openjdk.java.net/jdk/pull/3991/files/cc526aa4..3ce17c29 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3991&range=05-06 Stats: 38324 lines in 1589 files changed: 7901 ins; 27074 del; 3349 mod Patch: https://git.openjdk.java.net/jdk/pull/3991.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3991/head:pull/3991 PR: https://git.openjdk.java.net/jdk/pull/3991 From njian at openjdk.java.net Mon May 31 02:00:25 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 31 May 2021 02:00:25 GMT Subject: RFR: 8265907: JVM crashes when matching VectorMaskCmp Node [v3] In-Reply-To: References: Message-ID: On Mon, 26 Apr 2021 05:32:14 GMT, Ningsheng Jian wrote: > Hmm, currently min_vector_size for byte type is lower to 4 bytes to support vector api shuffle, but we don't have a per Opcode size check correctly. E.g. no length check for reduce_add8B, do you also see any issue for that? Since SLP does not support subword reduction, I think current match rules are fine. AArch64 part looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/3670 From mgkwill at openjdk.java.net Mon May 31 02:05:19 2021 From: mgkwill at openjdk.java.net (Marcus G K Williams) Date: Mon, 31 May 2021 02:05:19 GMT Subject: RFR: 8265907: JVM crashes when matching VectorMaskCmp Node [v3] In-Reply-To: <_otBn8mWHgqdIjnZ6yiBu_T6F5076C_aKL7yMHe-sHo=.d94a0510-0e91-4286-814c-5b568d7ce923@github.com> References: <_otBn8mWHgqdIjnZ6yiBu_T6F5076C_aKL7yMHe-sHo=.d94a0510-0e91-4286-814c-5b568d7ce923@github.com> Message-ID: On Mon, 26 Apr 2021 07:47:05 GMT, Wang Huang wrote: >> * fix the issue JDK-8265907 >> * all archs might be effected by this bug. I fixed x86 and aarch64. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix x86's bug @jatin-bhateja and @sviswa7 could you review this for x86? ------------- PR: https://git.openjdk.java.net/jdk/pull/3670 From jbhateja at openjdk.java.net Mon May 31 07:04:22 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 31 May 2021 07:04:22 GMT Subject: RFR: 8265907: JVM crashes when matching VectorMaskCmp Node [v3] In-Reply-To: References: <_otBn8mWHgqdIjnZ6yiBu_T6F5076C_aKL7yMHe-sHo=.d94a0510-0e91-4286-814c-5b568d7ce923@github.com> Message-ID: On Sat, 29 May 2021 03:25:18 GMT, Wang Huang wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix x86's bug > > @iwanowww @mgkwill Could you do me a favor to review this issue? Any suggestion is welcome. Hi @Wanghuang-Huawei , By adding suggested check we may suppress iotaShuffle intrinsification for Long256, Int128, Short64 species. Since currently shuffles are stored in a byte array so this case seeped through the constraint to prevent vectorization imposed by match rule supported vector (i.e. vlen < 4 for byte vectors). Please let me know if its ok to integrate following changes in your patch. diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad index 5f6d5bc6fdb..c9f64d154e3 100644 --- a/src/hotspot/cpu/x86/x86.ad +++ b/src/hotspot/cpu/x86/x86.ad @@ -6866,6 +6866,24 @@ instruct evcmpFD(vec dst, vec src1, vec src2, immI8 cond, rRegP scratch, kReg kt ins_pipe( pipe_slow ); %} +instruct vcmp4B(legVec dst, legVec src1, legVec src2, immI8 cond, legVec tmp1, legVec tmp2, rRegP scratch) %{ + predicate(vector_length_in_bytes(n->in(1)->in(1)) == 4 && + vector_element_basic_type(n->in(1)->in(1)) == T_BYTE); // src1 + match(Set dst (VectorMaskCmp (Binary src1 src2) cond)); + effect(TEMP_DEF dst, TEMP scratch, TEMP tmp1, TEMP tmp2); + format %{ "vector_compare $dst,$src1,$src2,$cond\t! 4 byte comparison" %} + ins_encode %{ + int vlen_enc = vector_length_encoding(this, $src1); + Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant); + Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1)); + __ punpcklbw($tmp1$$XMMRegister, $src1$$XMMRegister); + __ punpcklbw($tmp2$$XMMRegister, $src2$$XMMRegister); + __ vpcmpCCW($dst$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register); + __ packsswb($dst$$XMMRegister, $dst$$XMMRegister); + %} + ins_pipe( pipe_slow ); +%} + instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{ predicate(vector_length_in_bytes(n->in(1)->in(1)) >= 8 && // src1 vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1 ------------- PR: https://git.openjdk.java.net/jdk/pull/3670 From roland at openjdk.java.net Mon May 31 07:04:20 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 31 May 2021 07:04:20 GMT Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly In-Reply-To: References: Message-ID: On Fri, 28 May 2021 20:56:54 GMT, Nils Eliasson wrote: > Hi, > > PhaseStringOpts::int_stringSize doesn't handle min_jint correctly - it can't be negated to a positive number. I fix this adding a special case for this. > > Added a test that was contributed by Adam who reported the bug. Will add as a contributor. > > Please review, > Best regards, > Nils Eliasson Looks ok to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4255 From neliasso at openjdk.java.net Mon May 31 07:12:30 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 07:12:30 GMT Subject: RFR: JDK-8267930: Refine code for loading hsdis library In-Reply-To: References: Message-ID: On Sat, 29 May 2021 05:59:45 GMT, Hamlin Li wrote: > code for loading hsdis library is redundant, this is to simplify it. In general a very nice clean up! One suggestion - JVM_MAXPATHLEN and buf doesn't have a relationship in Disassembler::dll_load. Add the buf length as an argument and use that inside instead. ------------- Changes requested by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4260 From neliasso at openjdk.java.net Mon May 31 07:14:24 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 07:14:24 GMT Subject: RFR: 8267616: AArch64: Fix AES assertion messages in stubGenerator_aarch64.cpp In-Reply-To: References: Message-ID: On Sat, 29 May 2021 02:37:26 GMT, Dong Bo wrote: > Trivial fix for the messages of `assert(UseAES, "need AES instructions and misaligned SSE support")` in stubGenerator_aarch64.cpp. > The SSE instruction set is an extension to the x86 architecture. On aarch64, AES cryptographic extension are used for these intrinsics. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4259 From roland at openjdk.java.net Mon May 31 07:37:20 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 31 May 2021 07:37:20 GMT Subject: RFR: 8267904: C2 crash when compile negative Arrays.copyOf length after loop [v2] In-Reply-To: References: Message-ID: On Sun, 30 May 2021 00:50:41 GMT, Hui Shi wrote: >> C2 crash when Arrays.copyOf has a negative length after a loop. This happens in release and debug build. Test and hs_err are in JBS. >> >> Crash reason is: >> - CastIINode is created in GraphKit::new_array (in AllocateArrayNode::make_ideal_length), Cast array lenght to range [0, maxint-2]. This is safe when allocation is success and CastIINode 's input control is InitializeNode's proj control. >> - In LibraryCallKit::inline_arraycopy, InitializeNode's proj control's use nodes' control is replaced with AllocateArrayNode's input control (in LibraryCallKit::arraycopy_move_allocation_here). This is necessary to move allocation after array copy checks. But this also includes CastIINode. >> >> C->gvn_replace_by(init->proj_out(TypeFunc::Control), alloc->in(0)); >> >> - CastIINode's control is also adjust to AllocateArrayNode's input control, which is illegal state in laster IGVN phase, casting a negative to [0, maxint-2]. >> - This cause control and nodes after loop become top and removed. The previous loop has no fall-through edge and crash. >> >> Fix is: >> - In LibraryCallKit::arraycopy_move_allocation_here >> - Before replacing init->proj_out(TypeFunc::Control) in, find and replace CastIINode nodes with original array length. >> - After move allocation node, create CastIINode again if necessary. >> >> Before fix: node 250 is CastII which should be after InitializeNode. >> ![image](https://user-images.githubusercontent.com/70356247/119938428-f7fa4e80-bfbe-11eb-925e-c239620c73f3.png) >> >> After fix: all arry copy check is performed on original array length node 203 >> ![image](https://user-images.githubusercontent.com/70356247/119938532-2415cf80-bfbf-11eb-98c6-76e6b19b691f.png) >> >> New test test/hotspot/jtreg/compiler/c2/TestNegativeArrayCopyAfterLoop.java is added and pass. >> Tests performs on Linux X64 and no regression >> - Tier1/2/3/hotspot_all_no_apps on release and fastdebug build. >> - Tier1/2/3 with option "-XX:-TieredCompilation -Xbatch" on fastdebug build > > Hui Shi has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Overall fix looks ok to me. src/hotspot/share/opto/library_call.cpp line 4474: > 4472: assert(prev->type()->is_int()->_lo == cur->type()->is_int()->_lo, "not same"); > 4473: assert(prev->type()->is_int()->_hi == cur->type()->is_int()->_hi, "not same"); > 4474: } Is this really necessary? Have you seen cases with multiple identical CastII nodes? Or is it to be extra cautious? ------------- PR: https://git.openjdk.java.net/jdk/pull/4238 From shade at openjdk.java.net Mon May 31 07:50:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 31 May 2021 07:50:18 GMT Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly In-Reply-To: References: Message-ID: <4rECmdM0KzlcpBCPI9KKbBlMwRn3t1aKmjB0eubQ8tc=.f596b569-844f-436e-abd4-439a2e91a6a9@github.com> On Fri, 28 May 2021 20:56:54 GMT, Nils Eliasson wrote: > Hi, > > PhaseStringOpts::int_stringSize doesn't handle min_jint correctly - it can't be negated to a positive number. I fix this adding a special case for this. > > Added a test that was contributed by Adam who reported the bug. Will add as a contributor. > > Please review, > Best regards, > Nils Eliasson Huh, I found a similar issue with fuzzer tests this weekend. I minimized this to string concat bug as well, here is the test that I have: https://cr.openjdk.java.net/~shade/scratch/8267773.patch -- feel free to merge it here. test/hotspot/jtreg/compiler/stringopts/TestFetchStaticField.java line 37: > 35: import jdk.test.lib.Asserts; > 36: > 37: public class TestFetchStaticField { Why `TestFetchStaticField`? Is this really specific to static fields? test/hotspot/jtreg/compiler/stringopts/TestFetchStaticField.java line 48: > 46: > 47: public static void main(String[] argv) > 48: { Inconsistent bracing style, should be on the same line? ------------- PR: https://git.openjdk.java.net/jdk/pull/4255 From vlivanov at openjdk.java.net Mon May 31 08:11:24 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 31 May 2021 08:11:24 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v7] In-Reply-To: <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> Message-ID: <3XO9IjWTHC1_b1ru4esCTWosD4lDrhH8Drc2mmPTr6Q=.5ab3a289-870e-4ecb-a632-f8f05410e475@github.com> On Sun, 30 May 2021 18:37:09 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Review comments resolution. src/hotspot/share/opto/library_call.cpp line 5223: > 5221: Node* init_mem = map()->memory(); > 5222: > 5223: assert(scale->bottom_type()->isa_int(), "scale must be integer"); Strictly speaking, `scale` can be TOP and it won't pass the assert. Though it's an intrinsic which is used only from trusted code, it still makes sense to validate the inputs. From maintenance perspective, keeping JVM code robust enough to avoid crashes on invalid inputs is beneficial. src/hotspot/share/opto/library_call.cpp line 5227: > 5225: int scale_val = scale->bottom_type()->is_int()->get_con(); > 5226: BasicType prim_types[] = {T_BYTE, T_SHORT, T_INT, T_LONG}; > 5227: BasicType elem_bt = prim_types[scale_val]; It would be a OOB access if `scale_val` is outside expected range (`[0..3]`). src/hotspot/share/opto/library_call.cpp line 5251: > 5249: Node* cmp_res = _gvn.transform(new BoolNode(length_cmp, BoolTest::le)); > 5250: > 5251: fast_path = generate_guard(cmp_res, NULL, PROB_MAX); It looks confusing because `LibraryCallKit::generate_guard()` advertises the opposite. // In all cases, GraphKit::control() is updated to the fast path. // The returned value represents the control for the slow path. Is there a bug there (fast and slow path code swapped)? src/hotspot/share/opto/library_call.cpp line 5254: > 5252: > 5253: const TypeVect* vt = TypeVect::make(elem_bt, vec_len); > 5254: Node* mask_gen = _gvn.transform(new VectorMaskGenNode(ConvI2L(length), TypeVect::VECTMASK, elem_bt)); Should `ConvI2X` be used here? src/hotspot/share/opto/library_call.cpp line 5274: > 5272: } > 5273: > 5274: if (!stopped()) { Small suggestion. I find the following variant easier to read: if (stopped()) { // slow path is dead set_control(fast_path); set_result(fastcomp_result); clear_upper_avx(); return true; } // ... proceed with expanding slow path ... src/hotspot/share/opto/library_call.cpp line 5304: > 5302: set_result(fastcomp_result); > 5303: } > 5304: clear_upper_avx(); There was no `clear_upper_avx()` before. Was it overlooked before or is it needed only for new code (when partial inlining takes place)? ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Mon May 31 08:11:25 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 31 May 2021 08:11:25 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v7] In-Reply-To: <3XO9IjWTHC1_b1ru4esCTWosD4lDrhH8Drc2mmPTr6Q=.5ab3a289-870e-4ecb-a632-f8f05410e475@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> <3XO9IjWTHC1_b1ru4esCTWosD4lDrhH8Drc2mmPTr6Q=.5ab3a289-870e-4ecb-a632-f8f05410e475@github.com> Message-ID: On Mon, 31 May 2021 07:50:10 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266951: Review comments resolution. > > src/hotspot/share/opto/library_call.cpp line 5251: > >> 5249: Node* cmp_res = _gvn.transform(new BoolNode(length_cmp, BoolTest::le)); >> 5250: >> 5251: fast_path = generate_guard(cmp_res, NULL, PROB_MAX); > > It looks confusing because `LibraryCallKit::generate_guard()` advertises the opposite. > > // In all cases, GraphKit::control() is updated to the fast path. > // The returned value represents the control for the slow path. > > Is there a bug there (fast and slow path code swapped)? Also, what happens when the control implicitly set (fast path) is dead? I'd expect a `stopped()` check, but don't see any. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Mon May 31 08:48:25 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 31 May 2021 08:48:25 GMT Subject: RFR: 8267806: C1: Relax inlining checks for not yet initialized classes In-Reply-To: References: Message-ID: <0vlSF3HTARZ3o8Mdy3RmDnOfvvV78JZFQOEoYhKBdxk=.725f1dc7-e62e-4c3b-8829-bd4aa1846c6c@github.com> On Wed, 26 May 2021 17:41:33 GMT, Vladimir Ivanov wrote: > The checks which guide inlining decisions in C1 are too strong: declaring holder class is required to be fully initialized while JVMS only mandates an initialization barrier on resolved class in `invokestatic` case. > > The fix relaxes the checks to rule out only not yet linked classes unless it is an `invokestatic` call site. > > Testing: > - [x] hs-tier1 - hs-tier9 Thanks for the reviews, Roland and Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/4211 From vlivanov at openjdk.java.net Mon May 31 08:48:26 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 31 May 2021 08:48:26 GMT Subject: Integrated: 8267806: C1: Relax inlining checks for not yet initialized classes In-Reply-To: References: Message-ID: On Wed, 26 May 2021 17:41:33 GMT, Vladimir Ivanov wrote: > The checks which guide inlining decisions in C1 are too strong: declaring holder class is required to be fully initialized while JVMS only mandates an initialization barrier on resolved class in `invokestatic` case. > > The fix relaxes the checks to rule out only not yet linked classes unless it is an `invokestatic` call site. > > Testing: > - [x] hs-tier1 - hs-tier9 This pull request has now been integrated. Changeset: 35916ed5 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/35916ed57f425ea674de1e9d5023e7cf199a6740 Stats: 13 lines in 1 file changed: 6 ins; 3 del; 4 mod 8267806: C1: Relax inlining checks for not yet initialized classes Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4211 From neliasso at openjdk.java.net Mon May 31 08:51:22 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 08:51:22 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v7] In-Reply-To: <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> Message-ID: On Sun, 30 May 2021 18:37:09 GMT, Jatin Bhateja wrote: >> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. >> >> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. >> >> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. >> >> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. >> >> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). >> >> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- >> >> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain >> -- | -- | -- | -- | -- | -- | -- >> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 >> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 >> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 >> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 >> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 >> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 >> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 >> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 >> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 >> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 >> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 >> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 >> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 >> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 >> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 >> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 >> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 >> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 >> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 >> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 >> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 >> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 >> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 >> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 >> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 >> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 >> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 >> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 >> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 >> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 >> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 >> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 >> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 >> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 >> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 >> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 >> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 >> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 >> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 >> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 >> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 >> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 >> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 >> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 >> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 >> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 >> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 >> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 >> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 >> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 >> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 >> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 >> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 >> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 >> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 >> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 >> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 >> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 >> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 >> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 >> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 >> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 >> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 >> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 >> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 >> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 >> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 >> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 >> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 >> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8266951: Review comments resolution. src/hotspot/share/opto/library_call.cpp line 5237: > 5235: if (ArrayOperationPartailInlineSize > 32) { > 5236: enable_pi = is_subword_type(elem_bt) || elem_bt == T_INT; > 5237: } else if (ArrayOperationPartailInlineSize) { Should this be a comparison against something? "if (ArrayOperationPartailInlineSize) {" ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From vlivanov at openjdk.java.net Mon May 31 09:02:30 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 31 May 2021 09:02:30 GMT Subject: RFR: 8267947: CI: Preserve consistency between has_subklass() and is_subclass_of() Message-ID: <6gzElsxxVRzeXRUgwl64iCJGWsfTGiRHFkuVA6BuIH4=.785237d2-2326-4cf0-8abb-2fb136fa836a@github.com> CI caches `Klass::subklass() != NULL` query, but concurrent class loading can invalidate the cached value. Though recorded dependency won't let the nmethod to be installed, the inconcistency can manifest as type paradoxes until compilation is finished. The fix caches only `true` value (since it can't change unless class unloading takes place) and queries the VM otherwise. Testing: - [x] hs-tier1 - hs-tier6 ------------- Commit messages: - 8267947: CI: Preserve consistency between has_subklass() and is_subclass_of() Changes: https://git.openjdk.java.net/jdk/pull/4269/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4269&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267947 Stats: 41 lines in 2 files changed: 27 ins; 7 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4269.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4269/head:pull/4269 PR: https://git.openjdk.java.net/jdk/pull/4269 From neliasso at openjdk.java.net Mon May 31 09:26:20 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 09:26:20 GMT Subject: RFR: 8065760: CHA: Improve abstract method support In-Reply-To: References: Message-ID: On Wed, 26 May 2021 20:59:03 GMT, Vladimir Ivanov wrote: > Enable CHA to look for unique concrete methods under abstract root methods. > Only vtable-based implementation is affected. Old implementation is left as is. > > The unit test requires #4211 and #4212 to pass. > > Testing: > - [x] hs-tier1 - hs-tier9 The diff was a bit messy in github, once past that, the change is very reasonable. Approved. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4213 From whuang at openjdk.java.net Mon May 31 09:39:16 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 31 May 2021 09:39:16 GMT Subject: RFR: 8265907: JVM crashes when matching VectorMaskCmp Node [v4] In-Reply-To: References: Message-ID: > * fix the issue JDK-8265907 > * all archs might be effected by this bug. I fixed x86 and aarch64. Wang Huang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8265907 - fix x86's bug - 8265907: JVM crashes when matching VectorMaskCmp Node ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3670/files - new: https://git.openjdk.java.net/jdk/pull/3670/files/817c8beb..ea390ec1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3670&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3670&range=02-03 Stats: 591682 lines in 6867 files changed: 51926 ins; 521178 del; 18578 mod Patch: https://git.openjdk.java.net/jdk/pull/3670.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3670/head:pull/3670 PR: https://git.openjdk.java.net/jdk/pull/3670 From neliasso at openjdk.java.net Mon May 31 09:40:25 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 09:40:25 GMT Subject: RFR: 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend [v2] In-Reply-To: References: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> Message-ID: On Fri, 28 May 2021 07:32:26 GMT, Wang Huang wrote: >> Reason: >> >> >> operand pRegGov() >> %{ >> constraint(ALLOC_IN_RC(gov_pr)); >> match(RegVectMask); >> op_cost(0); >> format %{ %} >> interface(REG_INTER); >> %} >> >> if `pRegGov` is used as a `TEMP`, like : >> >> >> instruct insertB_small(vReg dst, vReg src, iRegIorL2I val, immI idx, pRegGov pTmp, rFlagsReg cr) >> %{ >> predicate(UseSVE > 0 && n->as_Vector()->length() <= 32 && >> n->bottom_type()->is_vect()->element_basic_type() == T_BYTE); >> match(Set dst (VectorInsert (Binary src val) idx)); >> effect(TEMP_DEF dst, TEMP pTmp, KILL cr); // here >> >> >> It will have the type `Type::VectorMask` in `MachTempNode` which is generated from `Expand`. However, we miss `Type::VectorMask` in `Type::category()`. >> >> We can fix this bug simply by adding `case Type::VectorMask` in `Type::category()`. >> >> Although now we can only reproduce this bug on AArch64, it should be added for all platforms with predicate support. > > Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. When changing hotspot code - please don't integrate until you have two reviews, and at least 24h have past since the PR was published. Change looks good. Approved. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4239 From whuang at openjdk.java.net Mon May 31 09:44:25 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 31 May 2021 09:44:25 GMT Subject: RFR: 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend [v2] In-Reply-To: References: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> Message-ID: On Mon, 31 May 2021 09:37:13 GMT, Nils Eliasson wrote: > When changing hotspot code - please don't integrate until you have two reviews, and at least 24h have past since the PR was published. > > Change looks good. Approved. Thank you for your suggestion. I will wait for two reviewers next time. ------------- PR: https://git.openjdk.java.net/jdk/pull/4239 From hshi at openjdk.java.net Mon May 31 09:44:28 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Mon, 31 May 2021 09:44:28 GMT Subject: RFR: 8267904: C2 crash when compile negative Arrays.copyOf length after loop [v2] In-Reply-To: References: Message-ID: <291IJcd-qb_PGTLrb08hHH0N0Yr8mKHpIsBsgbgBIEI=.37b162c7-dedd-4062-a8d3-bafdb16c79d6@github.com> On Mon, 31 May 2021 07:33:36 GMT, Roland Westrelin wrote: >> Hui Shi has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/share/opto/library_call.cpp line 4474: > >> 4472: assert(prev->type()->is_int()->_lo == cur->type()->is_int()->_lo, "not same"); >> 4473: assert(prev->type()->is_int()->_hi == cur->type()->is_int()->_hi, "not same"); >> 4474: } > > Is this really necessary? Have you seen cases with multiple identical CastII nodes? Or is it to be extra cautious? Thanks for your comment! Multiple identical CastII node do happens and check is added to ensure they are identical CastIINodes. Cases are: 1. CastIINode created in GraphKit::new_array 2. CastIINode might created before inline_arraycopy if GraphKit::load_array_length is invoked. If new array's length is loaded multiple times before System.arraycopy, there could be multiple CastIINode. ------------- PR: https://git.openjdk.java.net/jdk/pull/4238 From whuang at openjdk.java.net Mon May 31 09:44:27 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 31 May 2021 09:44:27 GMT Subject: Integrated: 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend In-Reply-To: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> References: <5erSH9joFXYHr6MIIjB4Qly6ElVqmcXO-ZX3MLdChME=.e1e4b3be-f366-4ebe-b0a9-a1893a6a432e@github.com> Message-ID: On Fri, 28 May 2021 06:47:19 GMT, Wang Huang wrote: > Reason: > > > operand pRegGov() > %{ > constraint(ALLOC_IN_RC(gov_pr)); > match(RegVectMask); > op_cost(0); > format %{ %} > interface(REG_INTER); > %} > > if `pRegGov` is used as a `TEMP`, like : > > > instruct insertB_small(vReg dst, vReg src, iRegIorL2I val, immI idx, pRegGov pTmp, rFlagsReg cr) > %{ > predicate(UseSVE > 0 && n->as_Vector()->length() <= 32 && > n->bottom_type()->is_vect()->element_basic_type() == T_BYTE); > match(Set dst (VectorInsert (Binary src val) idx)); > effect(TEMP_DEF dst, TEMP pTmp, KILL cr); // here > > > It will have the type `Type::VectorMask` in `MachTempNode` which is generated from `Expand`. However, we miss `Type::VectorMask` in `Type::category()`. > > We can fix this bug simply by adding `case Type::VectorMask` in `Type::category()`. > > Although now we can only reproduce this bug on AArch64, it should be added for all platforms with predicate support. This pull request has now been integrated. Changeset: 7ab6b401 Author: Wang Huang Committer: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/7ab6b4012026d4786a4c3937b559da9d3142a228 Stats: 65 lines in 2 files changed: 65 ins; 0 del; 0 mod 8267375: Aarch64: JVM crashes with option -XX:PrintIdealGraphLevel=3 on SVE backend Co-authored-by: Wang Huang Co-authored-by: Ai Jiaming Reviewed-by: aph, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/4239 From xgong at openjdk.java.net Mon May 31 10:32:35 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 31 May 2021 10:32:35 GMT Subject: RFR: 8267969: Add vectorized implementation for VectorMask.eq() Message-ID: <-oNx_BVgkjNGZHqtQhQ2hfnyibAd-NIj14iA3OUMBNk=.7be0c875-fab0-47b2-8af6-603eaf168941@github.com> Currently `"VectorMask.eq()" `is not vectorized: public VectorMask eq(VectorMask m) { // FIXME: Generate good code here. return bOp(m, (i, a, b) -> a == b); } This can be implemented by calling `"xor(m.not())"` directly. The performance improved about 1.4x ~ 1.9x for the following benchmark with different basic types: @Benchmark public Object eq() { boolean[] ma = fm.apply(size); boolean[] mb = fmb.apply(size); boolean[] mt = fmt.apply(size); VectorMask m = VectorMask.fromArray(SPECIES, mt, 0); for (int ic = 0; ic < INVOC_COUNT; ic++) { for (int i = 0; i < ma.length; i += SPECIES.length()) { var av = SPECIES.loadMask(ma, i); var bv = SPECIES.loadMask(mb, i); // accumulate results, so JIT can't eliminate relevant computations m = m.and(av.eq(bv)); } } return m; } ------------- Commit messages: - 8267969: Add vectorized implementation for VectorMask.eq() Changes: https://git.openjdk.java.net/jdk/pull/4272/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4272&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267969 Stats: 254 lines in 32 files changed: 248 ins; 6 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4272/head:pull/4272 PR: https://git.openjdk.java.net/jdk/pull/4272 From vlivanov at openjdk.java.net Mon May 31 10:39:36 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 31 May 2021 10:39:36 GMT Subject: RFR: 8267979: C2: Fix verification code in SubTypeCheckNode::Ideal() Message-ID: Make verification code in `SubTypeCheckNode::Ideal()` robust in presense of concurrent class loading. It repeatedly calls `Compile::static_subtype_check()` (directly and indirectly from `SubTypeCheckNode::sub()` through `Value()`) which can switch from `Compile::SSC_easy_test` to `Compile::SSC_full_test` as a result of concurrent class loading. As a result, it breaks the invariant being asserted. The fix tries to catch such case by checking that `Value()` result is stable across ``Compile::static_subtype_check()` call. The patch is split into 2 commits: extensive refactoring and the actual fix on top of it. Testing: - [x] hs-tier1 - hs-tier6 ------------- Commit messages: - Fix - Cleanups Changes: https://git.openjdk.java.net/jdk/pull/4271/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4271&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267979 Stats: 138 lines in 2 files changed: 54 ins; 46 del; 38 mod Patch: https://git.openjdk.java.net/jdk/pull/4271.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4271/head:pull/4271 PR: https://git.openjdk.java.net/jdk/pull/4271 From hshi at openjdk.java.net Mon May 31 11:00:19 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Mon, 31 May 2021 11:00:19 GMT Subject: RFR: 8267904: C2 crash when compile negative Arrays.copyOf length after loop [v2] In-Reply-To: <291IJcd-qb_PGTLrb08hHH0N0Yr8mKHpIsBsgbgBIEI=.37b162c7-dedd-4062-a8d3-bafdb16c79d6@github.com> References: <291IJcd-qb_PGTLrb08hHH0N0Yr8mKHpIsBsgbgBIEI=.37b162c7-dedd-4062-a8d3-bafdb16c79d6@github.com> Message-ID: On Mon, 31 May 2021 09:41:02 GMT, Hui Shi wrote: >> src/hotspot/share/opto/library_call.cpp line 4474: >> >>> 4472: assert(prev->type()->is_int()->_lo == cur->type()->is_int()->_lo, "not same"); >>> 4473: assert(prev->type()->is_int()->_hi == cur->type()->is_int()->_hi, "not same"); >>> 4474: } >> >> Is this really necessary? Have you seen cases with multiple identical CastII nodes? Or is it to be extra cautious? > > Thanks for your comment! > > Multiple identical CastII node do happens and check is added to ensure they are identical CastIINodes. Cases are: > 1. CastIINode created in GraphKit::new_array > 2. CastIINode might created before inline_arraycopy if GraphKit::load_array_length is invoked. If new array's length is loaded multiple times before System.arraycopy, there could be multiple CastIINode. simple example byte[] result = new byte[len]; // first CastII in GraphKit::new_array System.arraycopy(path, begin, result, 0, result.length); // second CastII when processing "result.length" ------------- PR: https://git.openjdk.java.net/jdk/pull/4238 From mli at openjdk.java.net Mon May 31 11:18:50 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 31 May 2021 11:18:50 GMT Subject: RFR: JDK-8267930: Refine code for loading hsdis library [v2] In-Reply-To: References: Message-ID: > code for loading hsdis library is redundant, this is to simplify it. Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - refine code - refine code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4260/files - new: https://git.openjdk.java.net/jdk/pull/4260/files/fa902ead..59dbbaae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4260&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4260&range=00-01 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4260.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4260/head:pull/4260 PR: https://git.openjdk.java.net/jdk/pull/4260 From mli at openjdk.java.net Mon May 31 12:40:50 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 31 May 2021 12:40:50 GMT Subject: RFR: JDK-8267930: Refine code for loading hsdis library [v3] In-Reply-To: References: Message-ID: > code for loading hsdis library is redundant, this is to simplify it. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix typo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4260/files - new: https://git.openjdk.java.net/jdk/pull/4260/files/59dbbaae..84ff400a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4260&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4260&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4260.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4260/head:pull/4260 PR: https://git.openjdk.java.net/jdk/pull/4260 From ysuenaga at openjdk.java.net Mon May 31 13:11:28 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Mon, 31 May 2021 13:11:28 GMT Subject: RFR: JDK-8267930: Refine code for loading hsdis library [v3] In-Reply-To: References: Message-ID: <9kZ4mBewG7VnWb-_TTI1iTm6_or-yMU4VFDvvFZzwBg=.ed584f43-178e-4f7c-a4da-e2de6041fe7d@github.com> On Mon, 31 May 2021 12:40:50 GMT, Hamlin Li wrote: >> code for loading hsdis library is redundant, this is to simplify it. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix typo src/hotspot/share/compiler/disassembler.cpp line 761: > 759: if (offset + strlen(hsdis_library_name) + strlen(os::dll_file_extension()) < buflen) { > 760: strcpy(&buf[offset], hsdis_library_name); > 761: strcat(&buf[offset], os::dll_file_extension()); I think we can more simplify these code as following: int sz = buflen - offset; int n_written = snprintf(&buf[offset], sz, "%s%s", hsdis_library_name, os::dll_file_extension()); if (n_written < sz) { `snprintf()` returns _`size`_ or more bytes if the buffer is not enough. https://man7.org/linux/man-pages/man3/printf.3.html ------------- PR: https://git.openjdk.java.net/jdk/pull/4260 From vladimir.x.ivanov at oracle.com Mon May 31 13:35:16 2021 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 31 May 2021 16:35:16 +0300 Subject: RFC, C2 Partial Escape Analysis In-Reply-To: <6aa40c2fc14643c782382fe95308be3d@EX13D46EUB003.ant.amazon.com> References: <6aa40c2fc14643c782382fe95308be3d@EX13D46EUB003.ant.amazon.com> Message-ID: Hi Jason, Thanks a lot for sharing your thoughts and plans about Partial Escape Analysis in C2. I enjoyed reading the document. Though it's clearly not finished yet, it already covers a lot of ground. It sparkled discussion among Compiler group members at Oracle and I'll try to summarise the main takeaways. ===== 1. First of all, the need to move allocations around poses the main implementaiton challenge for PEA implemantation in C2. At the moment, there's no reliable way to recompute JVM state when it is missing. So, the only feasible way is to capture JVM state at all important points in the graph during parsing (e.g., in form of SafePointNodes and easily discoverable) and keep the information until EA is over. That would enable EA to move allocations away from original place (or even split a single allocation into multiple ones). ===== 2. Move vs split decisions. Though it may be desireable (to keep generated code size small) to reduce the number of allocation sites in the graph, there's no such requirement. So, a single allocation is allowed to be split into a multiple ones, but the actual placement should abide the invariants imposed by the program order and JVMS. Since it is possible to insert arbitrary number of allocations, it becomes an optimization (and not correctness) problem to choose the actual placement. Consider: MyPair p = new MyPair(1, 2); if (cond1) { ... if (cond2) { ... = p; // escapes } ... if (cond3) { ... = p; // escapes } ... } It can be either turned into if (cond1) { MyPair p = new MyPair(1, 2); if (cond2) { ... = p; // escapes } ... if (cond3) { ... = p; // escapes } ... } or if (cond1) { ... if (cond2) { ... = new MyPair(1, 2); // escapes } ... if (cond3) { ... = new MyPair(1, 2); // escapes } ... } ===== 3. As you mention, EA is not only about tracking escaping events. Even for non-escaping allocations identity sensitive opetaions (e.g., pointer comparison) and memory aliasing (e.g., indexed accesses) pose some challenges and have to be tracked. (Possibly, requiring a JVM state for each one as well.) ===== 4. Escape sites separate the graph into 2 parts: before and after the instance escapes. In order to preserve identity invariants (and avoid identity paradoxes), PEA can't just put an allocation at every escape site. It should respect the order of escape events and ensure that the very same object is observed when multiple escape events happen. Dynamic invariant can be formulated as: there should never be more than 1 allocation at runtime per 1 eliminated allocation. Considering non-escaping operations can force materialization on their own, it poses additional constraints. MyPair p = new MyPair(1, 2); if (cond1) { a.f = p; // escapes here } if (cond2) { b.f = p; // escapes here } ... if (a.f == b.f) { // true when (cond1 && cond2) A possible simplifying assumption would be to put a single allocation at the LCA block for all escape sites, but it doesn't cover simple cases like: MyPair p = new ... if (cond) { ... if (cond_i) { // unlikely ... = p; // escape site } } else { ... if (cond_j) { // unlikely ... = p; // escape site } } ===== 5. The document doesn't mention CFG merges at all. But handling those properly (instead of bailing out the analysis) seem to be very important to make the optimization practical. MyPair p = new MyPair(1, 2); ... if (cond1) { ... = p; // escapes } // merge point // no escape points beyound it Even though there's only single escape point, it is illegal to assume that the object didn't escape beyond merge point. The invariant is that at every CFG merge point either ALL or NONE predecessor blocks have the object materialized. Some possible allocation placements are: MyPair p = NULL; ... // may benefit from scalarization p = new MyPair(1, 2); if (cond1) { ... = p; // escapes } // merge point MyPair p = NULL; ... // may benefit from scalarization of p if (cond1) { ... // may benefit from scalarization of p p = new MyPair(1, 2); ... = p; // escapes } else { ... // may benefit from scalarization of p p = new MyPair(1, 2); } // merge point: no p materialization allowed beyond this point // p = phi(p_cond1, p_cond2) Also, it poses some challenges to optimizing cases with loops when loop body contains an escape site. But it's not clear how much can be done anyway without reshaping the loop and extracting the escape site out of it (e.g., with loop predication or loop versioning). ===== Overall, the general consensus was that it doesn't look feasible to introduce new optimization pass specifically for PAE. Extending existing EA implementation with PEA capabilities looks the most promising way forward. It can be impelemented as an iterative process where: a) EA constructs connection graph; b) for globally escaping objects: based on escape site info, PEA chooses alternative allocation placement; c) based on PEA results, globally escaping allocations are moved/split; d) EA is run again on the modified graph; (And the process is repeated until it stabilizes or runs out of time.) Best regards, Vladimir Ivanov On 27.05.2021 15:08, Tatton, Jason wrote: > RFC, C2 Partial Escape Analysis > Hello everyone! > > We would like to invite discussion on a proposal for Partial Escape Analysis (PEA) to be added to the JVM C2 hotspot compiler. > > Specifically, we are looking to introduce a new compilation phase in order to enable optimizations such as scalar replacement in situations currently not encompassed by the existing control flow insensitive Escape Analysis (EA) phase of C2. > > PEA is a control flow sensitive variant of EA. We believe that by adding this compilation phase we will be able to improve compiled code performance and reduce heap size requirements by up to 10% and 8% respectively. > > At AWS, we have been prototyping a partial implementation of PEA with some success so far. We are at the stage now where we feel wider community involvement would be beneficial to the initiative. > Background: What is Escape Analysis > Escape analysis (EA) enables the HotSpot JIT C2 compiler to detect cases where objects do not "escape" the method/thread which created them. Whilst EA itself is not an optimization, the results of the analysis are used in order to enable a number of optimization techniques (as an alternative to conventional object allocation on the heap) including: > > * Stack allocation. Instead of allocating an object on the heap, it is instead allocated on the stack. In this way it is not considered eligible for garbage collection. This reduces the amount of work which must be done by garbage collectors, improving performance. C2 does not currently implement this optimization, though some other JVM implementations do such as IBM J9. > * Scalar replacement. This goes a step further than stack allocation, here the object allocation can be eliminated altogether and all object field interaction isreplaced by local variables. Eliminating object allocation, field referencing and reducing load on the garbage collector all improve performance and reduce memory footprint. > * Lock Elision. If a Java object monitor is used but the object but does not escape, then the lock can never be contented. In this case synchronization on said lock can be removed. > > Existing Limitations > The current EA implementation is flow insensitive. Let's examine what this means in practice... > > Here is an example of an object which does not escape the bounds of the method/thread creating it and is therefore considered non escaping and eligible for optimization. The current implementation of EA in C2 detects this case > > public class Pair{ > public int a; > public int b; > public Pair(int a, int b){ > this.a = a; > this.b = b; > } > } > > public static int foo(int a, int b){ > Pair mypair = new Pair(a, b); > return mypair.a; > } > and will enable scalar replacement such that the above code in foo can be transformed into the following form, which saves us from an instance of object allocation of Pair on every invocation of foo. > > public class Pair{ > public int a; > public int b; > public Pair(int a, int b){ > this.a = a; > this.b = b; > } > } > > public static int foo(int a, int b){ > // Pair mypair = new Pair(12, 13); > return a; // don't bother with object allocation of Pair > } > However, EA is control flow insensitive and will not enable scalar replacement of fields of an object where there is branching code with a path having the possibility of the object escaping the lifetime of the thread/method which created it, regardless of how unlikely that branch is to execute at runtime... For example: > > public static int foo(){ > Pair mypair = new Pair(12, 13); > if(something()){ > // maybe this branch is hit only 0.01% of the time > return usesAndPersistsObject(mypair); // escapes here > }else{ > return mypair.a; // no scalar replacement here > } > } > Above we see that in one branch the Pair instance object mypair escapes, therefore the mypair instance object is considered 'globally' escaping and must always be allocated on the heap. This means that scalar replacement cannot be used even when the escaping branch is only entered into very infrequently. > > PEA aims to largely solve this problem. > Partial Escape Analysis > Partial Escape Analysis (PEA) is a control flow sensitive variant of EA. Like EA it acts upon the ideal graph IR. With PEA we propose introducing the concept of a differed object allocation (DOA). We propose considering only objects which have been identified as escaping in the EA phase of compilation. With PEA we propose to defer object allocation of those objects to the latest point in program/branch execution before which that object has been identified as escaping. Scalar replacement and lock elision can then be applied up to that point of deferred object allocation. Sometimes the latest point for a particular object may resolve to be at the point of initial object allocation, in which case PEA will offer no benefit. > High level algorithm > We are interested in only a subset of ideal graph nodes. Specifically nodes related to escaping objects: > > * Allocation(A): Allocation and Initialization nodes. > * Branch(B): We are interested in branching nodes; IfNode, IfTrue, IfFalse nodes and phi nodes (e.g. to capture the case of: staticfield = condition?obj1:obj2 as well as loops, jumps related to break/continue etc) . > * Escape(E): Nodes concerning: object assigned to static field, object used as parameter in method invocation > * Usage(U): Nodes concerning: store to object field, loading from object field. > * Return(R): Behave similarly to Escape nodes, Return/ReThrow nodes. > We propose PEA to operate in via the following steps: > > 1. Graph reduction. This stage visits every node in the ideal graph and for the aforementioned nodes of interest, produces a reduced graph structure (example below) consisting of a reduced control flow sensitive set of nodes. The non-branching nodes of this reduced graph form are structured into basic blocks with IfTrue/IfFalse nodes used to denote the branches. The subset of nodes is used in step 2 and 3 below. > 2. Differed Object Allocation Algorithm. This stage figures out where opportunities for DOA exist for each allocated object that can escape. It determines the very last point in which an object can be differed allocated, aka materialized on the heap in a branch so as to maintain program semantics by considering branches and object usage. There may be more than one branch path resulting in a materialization of an allocated object. There may also be instances where the object is required to be in a materialized state before a branch node has been encountered - in this case then no further optimization can be enabled by PEA and subsequent processing can be skipped. > 3. Graph transformation. After applying DOA in step 2 we know the latest points at which an object may be materialized on the heap. With this knowledge all usage nodes prior to these points can be scalar replaced and all usage/escaping nodes after a materialization of a node need to be repointed to the nodes resulting from the materializations. Some repointing of the internal state of differed objects on the heap may be necessary at the point of materialization so as to maintain program semantics. We must also ensure consistency in the case of deoptimization. > Throughout all these mini phases, in the case of potential failure, the algorithm will skip subsequent processing. > Example > Let us now examine a case where PEA works well: > > 0: public static boolean RAND = true; // set externally > 1: public static boolean RAND2 = true; // set externally > 2: > 3: public static int infrequentBranch(int f1, int f2) { > 4: Pair intPair = new Pair(f1, f2); > 5: > 6: if (RAND) { // external call sets this to true 10% of the time > 7: intPair.a += 1; // example 'Use' of intPair > 8: Global.lastThing = intPair; // escapes here > 9: return 1; > 10: } > 11: > 12: // scalar replacable usage... > 13: intPair.a += 12; > 14: > 15: if (RAND2) { // external call sets this to true 10% of the time > 16: intPair.a += 10; // example 'Use' of intPair > 17: Global.lastThing = intPair; // escapes here > 18: return 2; > 19: } > 20: > 21: // if we get to this point there should have been > 22: // no object allocation (with PEA enabled) > 23: return 3; > 24: } > Above is an example of code containing two branching paths in which the intPair object escapes. If execution of the above code gets to the end of the method, with PEA enabled, intPair will not be allocated on the heap. > > Let us now look at the reduced graph of the above method: > > 4: A:Initialize[40] > 6: B:If[110] > -> IfTrue: (bb: 1) > 7: U:StoreI[121] of: Initialize[40] > 8: E:StoreN[126] of: Initialize[40] > 9: R:Return[130] > -> IfFalse: (bb: 2) > X > 13: U:StoreI[147] of: Initialize[40] > 15: B:If[158] > -> IfTrue: (bb: 3) > 16: U:StoreI[170] of: Initialize[40] > 17: E:StoreN[173] of: Initialize[40] > 18: R:Return[180] > -> IfFalse: (bb: 4) > X > 23: R:Return[194] > The reduced graph is composed of a root basic block (bb) (with id of 0) and four child bb's (1,2,3,4) corresponding to the code in the two if statements. Each if statement generates an IfTrue and IfFalse node each having a basic block associated. For readability, line number information is provided above (our prototype, discussed shortly, also includes bci information). > > The graph transformation mini phase is interested in the results of opportunities for differed allocation introduced by the DOA algorithm. In this example, that corresponds to the Initialize[40] node seen at the root bb. Observe that there are two DOA opportunities in each of the IfTrue branches at lines 8 and 17. Since both of these occur within branches which result in early termination of the method, the U:StoreI[147] node (corresponding to: intPair.a += 12;) may be scalar replaced. The nodes: U:StoreI[121] and U:StoreI[170] may optionally be scalar replaced, but there is little benefit because soon after the they are referenced the object which they reference will be materialized onto the heap. > > The DOA algorithm of PEA is able to able to handle more complex cases, for example say we have the following code: > > 0: public static boolean RAND = true; // set externally > 1: public static boolean RAND2 = true; // set externally > 2: > 3: public static int reallyInfrequentBranch(int f1, int f2) { > 4: Pair intPair = new Pair(f1, f2); > 5: > 6: if (RAND) { // external call sets this to true 1% of the time > 7: if (RAND2) { // external call sets this to true 1% of the time > 8: Global.lastThing = intPair; // escapes here > 9: unkownCall(); // non inlined call - maybe call triggering separate thread interacting with the state of Global.lastThing > 10: intPair.a += 12; // operates on materialized object > 11: } > 12: } > 13: > 16: return intPair.a; > 17: } > At first glance it would appear that the above code would benefit from PEA since the escape of intPair occurs within not just one but two infrequently branching nodes. However, PEA cannot be used here because when the intPair object escapes at line 8, in this example the the unkownCall() is not inlined and maybe performs an operation on the state of IntPair (maybe in a separate thread), as such to maintain program semantics the object needs to be in a materialized state at the point of initial declaration (i.e. created on the heap as normal). > > As far as the DOA algorithm is concerned for the above code, when it is processing the bb associated with the If statement on line 7 it will first mark the intPair object as escaping and materialized within the subsequent context of the branch, thus rendering the call on line 10 as not scalar replaceable. When DOA has finished processing this bb and moved on to the root bb it will come to the return statement on line 16, since the intPair is marked as being in a escaped and previously materialized state within a previous nested branch bb, this materialization will be "promoted" up to the root bb, which also just happens to be the location of the initial allocation thus eliminating this as an opportunity for PEA. > Progress at AWS > So far in our internal prototype, in the interests of failing fast, we have implemented up to step 2 of the outlined algorithm and are currently evaluating performance of a typical workload in which differed object allocation can be applied in an attempt to estimate the typical improvement in performance. Our initial prototype is implemented as a separate compilation phase which takes place after conventional, flow insensitive EA. > Flags > In support of this compilation phase we propose adding three JVM flags: > Flag > > Notes > > Default > > Notes > > DoPartialEscapeAnalysis > > perform partial escape analysis > > true > > global switch (set to false in preview build) > > PartialEALogLevel > > PEA log level > > 0 - nothing > > [0-5] more detail per log level - including the option to output the reduced graph representation above in a human readable format. this aids development and produciton debugging > > PartialEAOnly > > Restrict pea to only this method > > '' - empty string > > Equivalent to setting DoPartialEscapeAnalysis to true > > Challenges > > * Generally speaking, operating upon the ideal graph representation is challenging as it is a paradigm unique to the JVM and documentation is sparing. Tools such as the ideal graph visualizer are however excellent for improving engineer productivity when interacting with the ideal graph. > * When speaking in terms of this initiative, the most challenging aspect we have faced so far is in building reduced graph representations of complex phi node interactions, e.g. code such as: (RAND?holder1:holder2).held = RAND2?intPair1:intPair2; . We expect the majority of bugs in step 2 of our outlined algorithm to reside in this space. > * Another challenging area is that of "deadly embraces" where two objects have fields which point to one another and at least one of those objects is marked as escaping thus so rendering the other. For instance: object1.friend = object2; object2.friend = object1; We have not determined how to solve this problem yet but some mechanism to detect these cyclic relationships is required as we believe that these relationships are common in many of the java.util.* classes. > > Impact > We have seen that PEA can have a positive impact in JVM's,. The implementation in the GraalVM compiler is reported as having a positive benefit on improving performance and memory allocated in standardized benchmarks by 10% and 8% respectfully. For some benchmarks, the improvement in performance and memory allocated can be up to 58.5% and 33% respectively. > Risks > > * Increased C2 compilation time. Care must be applied to ensure an optimal implementation of this compilation phase so as to not impact JVM startup and code generation times. > * The additional graph transformation adds complexity to the idea graph and can ultimately result in larger and more complex generated machine code. > > Alternatives > As an alternative to PEA, effort could be invested in making the current EA implementation of C2 be control flow sensitive. However, we estimate that this would in practice look very much like a separate compilation phase anyway, as such we recommend, at least initially, implementing PEA as a separate phase, and considering merging if it is proven to be successful. > Relation to other projects > > * As with many other compiler optimizations, operating upon a larger portion of a program can improve efficiency of the optimization. As such we envisage that improvements in inlining will have a direct benefit on opportunities to introduce differed object allocation. > * If stack allocation is to be introduced to the JVM then PEA should be able to productively interact with this optimization. > * We do not envisage any negative impact upon project Loom. > > Further work > We believe that the scheme presented here is relatively conservative and should have the aforementioned positive impact. One extension to this work would be the introduction of a fast/slow path to be used in locations where scalar replacement would be possible on fields of objects which are very infrequently escaped (such as in the reallyInfrequentBranch example above). This comes with two challenges however: 1). being additional size/complexity of code generated and 2). the slow/fast path logic itself will have a performance impact so this needs to be used in moderation - perhaps applying this approach for cases where the escaping probability is low (<10%) and monitoring this using ongoing branch entry profiling information would be a reasonable tradeoff. > > Comments welcome! > -Jason and Xin > From neliasso at openjdk.java.net Mon May 31 14:00:21 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 14:00:21 GMT Subject: RFR: 8267979: C2: Fix verification code in SubTypeCheckNode::Ideal() In-Reply-To: References: Message-ID: <8z2cuDW4jIdMBB91ZifVg4-m0y_OZTGmlHPVKeo9G84=.b356c733-fcc2-43fa-8939-4032ff8b1681@github.com> On Mon, 31 May 2021 10:11:13 GMT, Vladimir Ivanov wrote: > Make verification code in `SubTypeCheckNode::Ideal()` robust in presense of concurrent class loading. > > It repeatedly calls `Compile::static_subtype_check()` (directly and indirectly from `SubTypeCheckNode::sub()` through `Value()`) which can switch from `Compile::SSC_easy_test` to `Compile::SSC_full_test` as a result of concurrent class loading. As a result, it breaks the invariant being asserted. > > The fix tries to catch such case by checking that `Value()` result is stable across ``Compile::static_subtype_check()` call. > > The patch is split into 2 commits: extensive refactoring and the actual fix on top of it. > > Testing: > - [x] hs-tier1 - hs-tier6 Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4271 From neliasso at openjdk.java.net Mon May 31 14:20:23 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 14:20:23 GMT Subject: RFR: 8267687: ModXNode::Ideal optimization is better than Parse::do_irem In-Reply-To: <7VdGuyOXAOwlvDy2Tq5IvZzurmn8F7kL08ifoo5wBtI=.66821483-a0e1-45b1-bccf-121834bba26b@github.com> References: <7VdGuyOXAOwlvDy2Tq5IvZzurmn8F7kL08ifoo5wBtI=.66821483-a0e1-45b1-bccf-121834bba26b@github.com> Message-ID: On Tue, 25 May 2021 15:34:44 GMT, Yi Yang wrote: > Hi all, > > Can I have a review of this change? I noticed there are two almost the same optimizations for % operation. For x%y, both Parse::do_irem and ModXNode::ideal are optimized for a special case that divisor y is `2^n` constant value. > > ModXNode::Ideal opt: > https://github.com/openjdk/jdk/blob/cc687fd43ade6be8760c559f3ffa909c5937727c/src/hotspot/share/opto/divnode.cpp#L112-L160 > > Parse::do_irem opt: > https://github.com/openjdk/jdk/blob/cc687fd43ade6be8760c559f3ffa909c5937727c/src/hotspot/share/opto/parse2.cpp#L1171-L1196 > > It turns out that ModXNode::Ideal optimization is better than Parse::do_irem in a simple microbenchmark(Please check out JBS attachment for detailed benchmark result): > > ModXNode::Ideal opt: > ---------------- > Benchmark Mode Cnt Score Error Units > ModPowerOf2.testNegativePowerOf2 avgt 25 8746.608 ? 139.777 ns/op > ModPowerOf2.testPositivePowerOf2 avgt 25 8735.545 ? 86.145 ns/op > > Parse::do_irem opt: > ---------------- > Benchmark Mode Cnt Score Error Units > ModPowerOf2.testNegativePowerOf2 avgt 25 8693.797 ? 7.844 ns/op > ModPowerOf2.testPositivePowerOf2 avgt 25 6618.652 ? 1.739 ns/op > > Diff for ideal graph: > ---------------- > ![ideal_graph](https://user-images.githubusercontent.com/5010047/119525589-34585f80-bdb1-11eb-9d7e-e3962cd7f789.jpg) > > Thanks! > Yang In general - I like the change! Very good that the optimization is moved out of the parse stage and into the node. On the choice of which optimization to keep - could you please post the generated assembly for both versions? Also - what hardware did you benchmark on? Do you see the same result on different CPUs? ------------- PR: https://git.openjdk.java.net/jdk/pull/4188 From jbhateja at openjdk.java.net Mon May 31 15:03:21 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 31 May 2021 15:03:21 GMT Subject: RFR: 8265907: JVM crashes when matching VectorMaskCmp Node [v3] In-Reply-To: References: <_otBn8mWHgqdIjnZ6yiBu_T6F5076C_aKL7yMHe-sHo=.d94a0510-0e91-4286-814c-5b568d7ce923@github.com> Message-ID: <8D-GhvKWLC3FcWr1og0sI0I8jh98ihuVUmBllihNSRA=.c486e3a3-9482-4ea4-9837-fa5be0cca30e@github.com> On Sat, 29 May 2021 03:25:18 GMT, Wang Huang wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix x86's bug > > @iwanowww @mgkwill Could you do me a favor to review this issue? Any suggestion is welcome. Hi @Wanghuang-Huawei , I have updated the proposed fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/3670 From neliasso at openjdk.java.net Mon May 31 15:06:56 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 15:06:56 GMT Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly [v2] In-Reply-To: References: Message-ID: > Hi, > > PhaseStringOpts::int_stringSize doesn't handle min_jint correctly - it can't be negated to a positive number. I fix this adding a special case for this. > > Added a test that was contributed by Adam who reported the bug. Will add as a contributor. > > Please review, > Best regards, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with two additional commits since the last revision: - Remove test - Add and fix test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4255/files - new: https://git.openjdk.java.net/jdk/pull/4255/files/da17bb36..43125189 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4255&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4255&range=00-01 Stats: 130 lines in 2 files changed: 60 ins; 70 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4255.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4255/head:pull/4255 PR: https://git.openjdk.java.net/jdk/pull/4255 From neliasso at openjdk.java.net Mon May 31 15:06:57 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 15:06:57 GMT Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly [v2] In-Reply-To: References: Message-ID: On Mon, 31 May 2021 07:01:45 GMT, Roland Westrelin wrote: > Looks ok to me. Thank you Roland! ------------- PR: https://git.openjdk.java.net/jdk/pull/4255 From neliasso at openjdk.java.net Mon May 31 15:07:02 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 15:07:02 GMT Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly [v2] In-Reply-To: <4rECmdM0KzlcpBCPI9KKbBlMwRn3t1aKmjB0eubQ8tc=.f596b569-844f-436e-abd4-439a2e91a6a9@github.com> References: <4rECmdM0KzlcpBCPI9KKbBlMwRn3t1aKmjB0eubQ8tc=.f596b569-844f-436e-abd4-439a2e91a6a9@github.com> Message-ID: On Mon, 31 May 2021 07:44:18 GMT, Aleksey Shipilev wrote: >> Nils Eliasson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove test >> - Add and fix test > > test/hotspot/jtreg/compiler/stringopts/TestFetchStaticField.java line 37: > >> 35: import jdk.test.lib.Asserts; >> 36: >> 37: public class TestFetchStaticField { > > Why `TestFetchStaticField`? Is this really specific to static fields? No - it's a leftover from when I was on the completely wrong track :P I removed this test and added yours instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/4255 From jbhateja at openjdk.java.net Mon May 31 15:18:50 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 31 May 2021 15:18:50 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v8] In-Reply-To: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> Message-ID: <4-59jNEix65uu8cBY266-g7ckv_8bvsJX1FCX3r9pmw=.1d82653a-6adb-40d6-b5d5-2a1fe3d0d738@github.com> > ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs. > > For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length. > > If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted. > > This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons. > > Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes). > > Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :- > > Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > > > > > > > > > > > > > > BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain > -- | -- | -- | -- | -- | -- | -- > ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937 > ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706 > ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356 > ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359 > ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225 > ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678 > ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325 > ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792 > ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127 > ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046 > ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041 > ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548 > ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676 > ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436 > ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271 > ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542 > ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099 > ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473 > ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976 > ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337 > ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724 > ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699 > ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812 > ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664 > ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026 > ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386 > ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428 > ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033 > ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163 > ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697 > ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152 > ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259 > ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555 > ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197 > ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849 > ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154 > ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281 > ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996 > ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907 > ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756 > ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833 > ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327 > ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431 > ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655 > ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122 > ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604 > ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586 > ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337 > ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956 > ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703 > ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532 > ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659 > ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391 > ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958 > ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813 > ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475 > ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823 > ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883 > ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287 > ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654 > ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408 > ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711 > ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858 > ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275 > ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225 > ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343 > ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499 > ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868 > ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652 > ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8266951: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3999/files - new: https://git.openjdk.java.net/jdk/pull/3999/files/bddc6051..da4890d8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=06-07 Stats: 47 lines in 1 file changed: 14 ins; 3 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/3999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999 PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Mon May 31 15:18:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 31 May 2021 15:18:53 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v7] In-Reply-To: References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> <3XO9IjWTHC1_b1ru4esCTWosD4lDrhH8Drc2mmPTr6Q=.5ab3a289-870e-4ecb-a632-f8f05410e475@github.com> Message-ID: On Mon, 31 May 2021 07:55:40 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/library_call.cpp line 5251: >> >>> 5249: Node* cmp_res = _gvn.transform(new BoolNode(length_cmp, BoolTest::le)); >>> 5250: >>> 5251: fast_path = generate_guard(cmp_res, NULL, PROB_MAX); >> >> It looks confusing because `LibraryCallKit::generate_guard()` advertises the opposite. >> >> // In all cases, GraphKit::control() is updated to the fast path. >> // The returned value represents the control for the slow path. >> >> Is there a bug there (fast and slow path code swapped)? > > Also, what happens when the control implicitly set (fast path) is dead? I'd expect a `stopped()` check, but don't see any. generate_guard() prefers a slow path if condition is evaluated to true. Thus I chose to pass the condition ( SIZE < ArrayOperationPartialInlineSize) as it is instead of negating the condition before passing to generate_guard(). When the condition is evaluated to false then fast path will be NULL and only slow path exist (i.e. implicit control), so all the nodes control dependent on fast path will be sweeped out during GVN. I have added a detailed comment in the code on this. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From jbhateja at openjdk.java.net Mon May 31 15:18:59 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 31 May 2021 15:18:59 GMT Subject: RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v7] In-Reply-To: <3XO9IjWTHC1_b1ru4esCTWosD4lDrhH8Drc2mmPTr6Q=.5ab3a289-870e-4ecb-a632-f8f05410e475@github.com> References: <0YtRuwnVZ-Ejs-22d0JDJeFzXiZ17XNuBT1o5Ma4ZkI=.9dd9e952-d452-4175-8ff5-8f41e990a555@github.com> <1aRY-G0e2xZFjYt5F7Ode_3B2EuF8vK6QSP6Yd_LtHk=.8930ee0d-755e-4da5-b65e-a7d86449b6de@github.com> <3XO9IjWTHC1_b1ru4esCTWosD4lDrhH8Drc2mmPTr6Q=.5ab3a289-870e-4ecb-a632-f8f05410e475@github.com> Message-ID: On Mon, 31 May 2021 07:44:16 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8266951: Review comments resolution. > > src/hotspot/share/opto/library_call.cpp line 5254: > >> 5252: >> 5253: const TypeVect* vt = TypeVect::make(elem_bt, vec_len); >> 5254: Node* mask_gen = _gvn.transform(new VectorMaskGenNode(ConvI2L(length), TypeVect::VECTMASK, elem_bt)); > > Should `ConvI2X` be used here? Partial in-lining is supported only for 64 bit targets so ConvI2X will boil down to ConvI2L. > src/hotspot/share/opto/library_call.cpp line 5304: > >> 5302: set_result(fastcomp_result); >> 5303: } >> 5304: clear_upper_avx(); > > There was no `clear_upper_avx()` before. > Was it overlooked before or is it needed only for new code (when partial inlining takes place)? This is added for newer code dealing with vectors. ------------- PR: https://git.openjdk.java.net/jdk/pull/3999 From chagedorn at openjdk.java.net Mon May 31 15:43:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 31 May 2021 15:43:41 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v10] In-Reply-To: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: > This RFE provides an IR test framework to perform regex-based checks on the C2 IR shape of test methods emitted by the VM flags `-XX:+PrintIdeal` and `-XX:+PrintOptoAssembly`. The framework can also be used for other non-IR matching (and non-compiler) tests by providing easy to use annotations for commonly used testing patterns and compiler control flags. > > The framework is based on the ideas of the currently present IR test framework in [Valhalla](https://github.com/openjdk/valhalla/blob/e9c78ce4fcfd01361c35883e0d68f9ae5a80d079/test/hotspot/jtreg/compiler/valhalla/inlinetypes/InlineTypeTest.java) (mainly implemented by @TobiHartmann) which is being used with great success. This new framework aims to replace the old one in Valhalla at some point. > > A detailed description about how this new IR test framework works and how it is used is provided in the [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) file and in the [Javadocs](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc/jdk/test/lib/hotspot/ir_framework/package-summary.html) written for the framework classes. > > To finish a first version of this framework for JDK 17, I decided to leave some improvement possibilities and ideas to be followed up on in additional RFEs. Some ideas are mentioned in "Future Work" in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) and were also created as subtasks of this RFE. > > Testing (also described in "Internal Framework Tests in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md)): > There are various tests to verify the correctness of the test framework which can be found as JTreg tests in the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) folder. Additional testing was performed by converting all compiler Inline Types test of project Valhalla (done by @katyapav in [JDK-8263024](https://bugs.openjdk.java.net/browse/JDK-8263024)) that used the old framework to the new framework. This provided additional testing for the framework itself. We ran the converted tests with all the flag settings used in hs-tier1-9 and hs-precheckin-comp. For sanity checking, this was also done with a sample IR test in mainline. > > Some stats about the framework code added to [ir_framework](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework): > > - without the [Javadocs files](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc) : 60 changed files, 13212 insertions, 0 deletions. > - without the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) and [examples](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/examples) folder: 40 files changed, 6781 insertions > - comments: 2399 insertions (calculated with `git diff --cached !(tests|examples) | grep -c -E "(^[+-]\s*(/)?*)|(^[+-]\s*//)"`) > - which leaves 4382 lines of code inserted > > Big thanks to: > - @TobiHartmann for all his help by discussing the new framework and for providing insights from his IR test framework in Valhalla. > - @katyapav for converting the Valhalla tests to use the new framework which found some harder to catch bugs in the framework and also some actual C2 bugs. > - @iignatev for helping to simplify the framework usage with JTreg and with the framework internal VM calling structure. > - and others who provided valuable feedback. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: - Fix Compiler and CompLevel ANY and fix tests after merge - Merge branch 'master' into JDK-8254129 - Splitting classes into subpackages and updating README accordingly, fix bug with new line matching in lookbehind on Windows - Fix package names and fixing internal tests, examples and README file accordingly - Move framework to test/hotspot/jtreg/compiler/lib and tests to test/hotspot/jtreg/testlibrary_tests/compiler/lib/ir_framework - Remove TestFramework: both runWithScenarios, both runWithHelperClasses, and one runWithFlags method to make interface cleaner/simpler, update internal tests accordingly - Minor improvements, comment fixes, and test fixes - Rename TestFrameworkPrepareFlags -> FlagVM and rename TestFrameworkExecution -> TestVM - Apply review comments: Extract Test classes into own files, extract Flag and Test VM processes into own class, replace socket-based flag VM communication with file-based and clean up socket usage of test VM, fix useless processing of hotspot-pid files if no IR rules are applied - Fix XCOMP cases from old framework and turn it into new debug flag -DIgnoreCompilerControls - ... and 14 more: https://git.openjdk.java.net/jdk/compare/43104827...c35c658c ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3508/files - new: https://git.openjdk.java.net/jdk/pull/3508/files/4424e01f..c35c658c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3508&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3508&range=08-09 Stats: 678487 lines in 8333 files changed: 92327 ins; 561494 del; 24666 mod Patch: https://git.openjdk.java.net/jdk/pull/3508.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3508/head:pull/3508 PR: https://git.openjdk.java.net/jdk/pull/3508 From chagedorn at openjdk.java.net Mon May 31 15:43:42 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 31 May 2021 15:43:42 GMT Subject: RFR: 8254129: IR Test Framework to support regex-based matching on the IR in JTreg compiler tests [v9] In-Reply-To: References: <2iYQOJ5yeu7SvGcScLPBOWCPMLv69e1ksOL1vW3ytL8=.0c27621d-ef3d-422c-9d8c-922078ca3160@github.com> Message-ID: On Tue, 4 May 2021 15:53:25 GMT, Christian Hagedorn wrote: >> This RFE provides an IR test framework to perform regex-based checks on the C2 IR shape of test methods emitted by the VM flags `-XX:+PrintIdeal` and `-XX:+PrintOptoAssembly`. The framework can also be used for other non-IR matching (and non-compiler) tests by providing easy to use annotations for commonly used testing patterns and compiler control flags. >> >> The framework is based on the ideas of the currently present IR test framework in [Valhalla](https://github.com/openjdk/valhalla/blob/e9c78ce4fcfd01361c35883e0d68f9ae5a80d079/test/hotspot/jtreg/compiler/valhalla/inlinetypes/InlineTypeTest.java) (mainly implemented by @TobiHartmann) which is being used with great success. This new framework aims to replace the old one in Valhalla at some point. >> >> A detailed description about how this new IR test framework works and how it is used is provided in the [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) file and in the [Javadocs](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc/jdk/test/lib/hotspot/ir_framework/package-summary.html) written for the framework classes. >> >> To finish a first version of this framework for JDK 17, I decided to leave some improvement possibilities and ideas to be followed up on in additional RFEs. Some ideas are mentioned in "Future Work" in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md) and were also created as subtasks of this RFE. >> >> Testing (also described in "Internal Framework Tests in [README.md](https://github.com/chhagedorn/jdk/blob/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/README.md)): >> There are various tests to verify the correctness of the test framework which can be found as JTreg tests in the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) folder. Additional testing was performed by converting all compiler Inline Types test of project Valhalla (done by @katyapav in [JDK-8263024](https://bugs.openjdk.java.net/browse/JDK-8263024)) that used the old framework to the new framework. This provided additional testing for the framework itself. We ran the converted tests with all the flag settings used in hs-tier1-9 and hs-precheckin-comp. For sanity checking, this was also done with a sample IR test in mainline. >> >> Some stats about the framework code added to [ir_framework](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework): >> >> - without the [Javadocs files](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/doc) : 60 changed files, 13212 insertions, 0 deletions. >> - without the [tests](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/tests) and [examples](https://github.com/chhagedorn/jdk/tree/aa005f384a4567c6c0b5f08f7c5df57f705dc540/test/lib/jdk/test/lib/hotspot/ir_framework/examples) folder: 40 files changed, 6781 insertions >> - comments: 2399 insertions (calculated with `git diff --cached !(tests|examples) | grep -c -E "(^[+-]\s*(/)?*)|(^[+-]\s*//)"`) >> - which leaves 4382 lines of code inserted >> >> Big thanks to: >> - @TobiHartmann for all his help by discussing the new framework and for providing insights from his IR test framework in Valhalla. >> - @katyapav for converting the Valhalla tests to use the new framework which found some harder to catch bugs in the framework and also some actual C2 bugs. >> - @iignatev for helping to simplify the framework usage with JTreg and with the framework internal VM calling structure. >> - and others who provided valuable feedback. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with three additional commits since the last revision: > > - Splitting classes into subpackages and updating README accordingly, fix bug with new line matching in lookbehind on Windows > - Fix package names and fixing internal tests, examples and README file accordingly > - Move framework to test/hotspot/jtreg/compiler/lib and tests to test/hotspot/jtreg/testlibrary_tests/compiler/lib/ir_framework I'm back to work and "reopen" this PR again. @TobiHartmann Good point! I merged the master branch and fixed this to `-1` together with some other test failures that emerged after the update. ------------- PR: https://git.openjdk.java.net/jdk/pull/3508 From shade at openjdk.java.net Mon May 31 16:13:24 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 31 May 2021 16:13:24 GMT Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly [v2] In-Reply-To: References: Message-ID: On Mon, 31 May 2021 15:06:56 GMT, Nils Eliasson wrote: >> Hi, >> >> PhaseStringOpts::int_stringSize doesn't handle min_jint correctly - it can't be negated to a positive number. I fix this adding a special case for this. >> >> Added a test that was contributed by Adam who reported the bug. Will add as a contributor. >> >> Please review, >> Best regards, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with two additional commits since the last revision: > > - Remove test > - Add and fix test My OpenJDK handle is "shade". ------------- PR: https://git.openjdk.java.net/jdk/pull/4255 From nils.eliasson at oracle.com Mon May 31 15:06:05 2021 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 31 May 2021 17:06:05 +0200 Subject: RFR: 8267773: PhaseStringOpts::int_stringSize doesn't handle min_jint correctly [v2] In-Reply-To: References: Message-ID: <6cc27bc9-d3b2-2b11-8cb4-3fa034be35eb@oracle.com> Updated. Thanks for contributing the test! Best regards, Nils On 2021-05-31 18:13, Aleksey Shipilev wrote: > On Mon, 31 May 2021 15:06:56 GMT, Nils Eliasson wrote: > >>> Hi, >>> >>> PhaseStringOpts::int_stringSize doesn't handle min_jint correctly - it can't be negated to a positive number. I fix this adding a special case for this. >>> >>> Added a test that was contributed by Adam who reported the bug. Will add as a contributor. >>> >>> Please review, >>> Best regards, >>> Nils Eliasson >> Nils Eliasson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove test >> - Add and fix test > My OpenJDK handle is "shade". > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4255 From neliasso at openjdk.java.net Mon May 31 19:17:20 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 31 May 2021 19:17:20 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). I'll give this a spin through testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/4247